Debate on Linux Virtual Memory Handling

His favorite? by LinuxGeek8 · 2001-10-30 02:41 · Score: 4, Interesting

He seems to think a lot in favor of the Andrea VM.
That's ok to me, but he might want to take notice of the fact that linus didn't accept Rik's patches a lot and that 2.4.9 still had actually the VM of 2.4.5. The -ac tree was more up to date.
So for a good comparison you'll need to compare the linus and the ac tree.

--
Well, don't worry about that. We can get you back before you leave. (Dr. Who)

Re:His favorite? by mr3038 · 2001-10-30 04:23 · Score: 5, Insightful

I'm aware that this doesn't mean they've met in person, but it shows that Moshe has discussed things with Rik before AA's VM was written. So I think he holds nothing agains Rik, he just likes aa's VM better.
In addition to this it seems that he has implemented VM with reverse mapping also. Therefore it should be clear that he previously thought this was the best method. I've understood that the issue between Rik's and AA's VMs is that Rik's is optimized for normal swapping and AA's for OOM case. Because VM performance really matters only when OOM happens I think AA's should be superior. The real difference depends on benchmark, of course.
Both systems seem to be somewhat equal. AA's needs less swap but Rik's is claimed to be better performer. If AA's system is simpler then that's what should be used. Select maintainability over questionable performance increase. This is like quicksort - there's a point when you usually get better performance bubble sorting the little pieces quicksort generates during the whole sort. The smart version isn't always the best. Nowadays CPUs can easily do a bunch of dumb operations faster than one smart operation.

--
_________________________
Spelling and grammar mistakes left as an exercise for the reader.
Re:His favorite? by be-fan · 2001-10-30 11:22 · Score: 2

Were not talking userspace applications here, we're talking about the kernel. Both the Linux and FreeBSD projects go to extreme lengths to optimize the performance of their kernels. The FreeBSD VM, for example, is very complex, but performs well due to that complexity.

--
A deep unwavering belief is a sure sign you're missing something...

To fork, or not to fork by imrdkl · 2001-10-30 02:42 · Score: 3, Flamebait

From the article:

Nobody has yet dared to speak of a Linux source fork, but this is dangerously close to one.

Is this truly dangerous? If so, why? Why not let the 2 VM's compete and the users will decide?

Better to split than stagnate.

Re:To fork, or not to fork by scooby-doo · 2001-10-30 02:52 · Score: 2, Informative

Essentially that is already going on. The Andrea VM is in Linus's tree now and Rik's VM is still in Alan Cox's tree. So by choosing the official Linus kernel or the -ac kernel you can choose which VM subsystem you would rather use.
Re:To fork, or not to fork by mwalker · 2001-10-30 02:59 · Score: 2, Flamebait

Better to split than stagnate.

True, look at the success of the "Gnome vs. KDE" split.

--

--
What happens when you outlaw guns
Re:To fork, or not to fork by sshore · 2001-10-30 03:02 · Score: 3, Informative

Why not let the 2 VM's compete and the users will decide?

The problem is the duplication of effort and decreased manpower for each VM. Not only that, but any project that works closely with the VM has to test under twice as many conditions, and may require different code for each. Talk about a maintenance problem.

It's certainly good to have competition to bring out the best in each system, but it would be horribly inefficient to keep it going in the long run.

Regarding the users choosing - the users don't have the opportunity to choose only on the basis of the VM. It's not like they can apply the "VM patch" to the stock kernel to try out the other one, rather, they have to apply a fairly large -ac patch that changes a lot of unrelated things.
Re:To fork, or not to fork by ethereal · 2001-10-30 03:04 · Score: 4, Informative

Well, drivers eventually do get from the -ac tree into the Linus tree, you know - the whole point is that AC tries them out until they are stable enough for Linus. Not to mention that Mr. Cox does have some responsibility to provide RedHat with the best kernel he can, no matter what Linus thinks of it. The only weird thing here is that as far as the VM goes, Linus has picked up the more experimental code first. So people who always recompile the Linus kernel when they install a new distro may find that their kernel operates very differently after that.
My naive thought is that the best way to do it would be to somehow modularize the two VMs so that it can be a compile-time or boot-time option, and let users try both on the same box to see which is better. However, I imagine this would be a ton of work to set up.

--
Your right to not believe: Americans United for Separation of Church and
Re:To fork, or not to fork by Flower · 2001-10-30 03:16 · Score: 2

A better analogy would be look at the success of emacs v. xemacs. iirc, that was a true fork.

--
I don't want knowledge. I want certainty. - Law, David Bowie
Re:To fork, or not to fork by battjt · 2001-10-30 03:29 · Score: 4, Insightful

Look at the success of EGCS and GCC. That was a successful split and merge. It led to a better GCC in the end while supporting both stable and advance versions of gcc in the interim.

Joe

--
Joe Batt Solid Design
Re:To fork, or not to fork by Salamander · 2001-10-30 03:38 · Score: 2

Why not let the 2 VM's compete and the users will decide?

There was an interesting thread on this a while back, rooted at this comment. Unfortunately, the article's old enough that it's only available as a static page, and the oh-so-wonderful Slashdot code that generated the page seems to've done so with the comments in basically random order, so it's almost impossible to follow the thread. Maybe I'll try to recreate its original structure and put the result on my website.

In brief, I think it's a great idea. Competition is good; let individuals and teams compete on the basis of the quality of their work, and bless the "winner" as part of the official tree. The "loser" is always free to try again in the next round. The only problem is that this all should have occurred in the 2.3.xx and/or 2.5.xx series; 2.4.xx should not be changing horses in midstream.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:To fork, or not to fork by einstein · 2001-10-30 03:49 · Score: 3, Informative

My naive thought is that the best way to do it would be to somehow modularize the two VMs so that it can be a compile-time or boot-time option, and let users try both on the same box to see which is better. However, I imagine this would be a ton of work to set up.
this was discussed on the kernel mailing, (check out http://kt.zork.net). The general conclusion was that this would be really had to do with the current build/module system, but kbuild 2.5 has the ability to apply patches before building as long as the patches don't overlap.
---

--
I post links to stuff here
Re:To fork, or not to fork by Speare · 2001-10-30 04:00 · Score: 3, Interesting

The problem is the duplication of effort and decreased manpower for each VM. Not only that, but any project that works closely with the VM has to test under twice as many conditions, and may require different code for each. Talk about a maintenance problem.
And this would somehow not be the problem with a fork? Considering Linux vs *BSD is already a division of the pool of possibly alignable geeks, and considering both Linux and *BSD families continue to grow, innovate and expand, I think the problem is overrated.
Organizations align on common goals and pursuits, by definition. If there were two or more unalignable goals in the VM, then either a fork or an unforked competition would be in order, and would have the same issues of reduced effort and increased maintenance chores.
Personally, as a non-kernel developer, I think the different VM issues are probably overblown in the moment, and that the best approaches will forge ahead with some significant consensus in the mid-term. Until then, it's worth the experimentation it takes to decide what are the best approaches.

--
[ .sig file not found ]
Re:To fork, or not to fork by Mr.+Fred+Smoothie · 2001-10-30 05:31 · Score: 2, Funny

MrC was heavily optimized for PPC by people who spent years working on it and not having to worry about any other architecture (except maybe m68k).

GCC is a large compiler projects w/ frontends for many languages and backends for an obscene # of platforms.

Over the years, both the frontends and backends of GCC have steadily improved. Just as Intel has contributed code or expertise for the x86 backend, certainly anyone who's worked on Apple's compilers is welcome to send patches to the GCC maintainers to improve PPC performance (though whether or not Apple lawyers would have something to say about that might complicate the issue).

And hey; if you feel so passionate about it, why don't you code up some test programs, compare the assembler output, and mail the code & generated results to the GCC team so that they know where improvement is needed: oh, but wait; it *is* much easier just to bitch on slashdot, isn't it?

On second thought, damn why can't all these volunteer programmers do a better job of making my life easier without any help from me...

--
Re:To fork, or not to fork by jovlinger · 2001-10-30 06:05 · Score: 2

'cause I have no intuition at all about these things:

Roughly how much of the kernel/drivers/whatever that constitute the source tree we call the kernel has to know about virtual memory implementation? I would have thought that even the most low-level driver would interact with the VM system at the API level, so it wouldn't matter which VM implementation manages the pages as all the driver cares about is that this page is locked in physical RAM, while those pages can be moved as the VM sees fit.

eh?
Re:To fork, or not to fork by Nater · 2001-10-30 06:14 · Score: 2

Not only that, but GCC has to dodge patents left and right. There are lots of optimizations that are verboten from the project for that reason, and will probably be included when the patents expire. In the meantime, the GCC crew is left with the incredibly consuming process of developing non-infringing optimizations that simply aren't allowed to be as good as they could be.

--
I like to play children's songs in minor keys.
"We're all sons of bitches now." --J. Robert Oppenheimer
Re:To fork, or not to fork by Ed+Avis · 2001-10-30 06:26 · Score: 2

Suppose Linux did fork...

If Linus's version is called linux, what is Alan's version called?

--
-- Ed Avis ed@membled.com
Re:To fork, or not to fork by Tumbleweed · 2001-10-30 07:16 · Score: 2

Erm, 'Redhat', probably.
Re:To fork, or not to fork by Skapare · 2001-10-30 07:58 · Score: 2

The sad part of this is that now when manufacturers make drivers, they will work with Redhat (because thats the distribution the suits tend to choose ... Redhat is sort of the "Windows" of the open source OS world given its market aims and its somewhat lower-tech orientation) but NOT with Linux.

--
now we need to go OSS in diesel cars

Re:The failure of Open Source by Anonymous Coward · 2001-10-30 02:43 · Score: 4, Insightful

Oh please. Have you ever worked on a commercial software project? I've seen just as much if not more ego in moronic engineering team meetings at my enterprise software company. Without a single strong technical leader OR a group of smart people who all equally respect each other's opinions, the SAME THING happens on a commercial project. I've watched a Director of Engineering call meetings almost every day for 3 weeks in a row because he didn't know how to solve exactly this sort of problem. In the end he just decided to go with what the person with the most years of experience said and to get the CEO to give him blanket license to make that technical decision, though none of the other engineers agreed with it - they were all too conflict averse to speak up and too afraid about losing there jobs just as the economy was tanking (he made a bad decision indeed and the project suffered greatly for it, getting delayed by 3-4 months and even then never delivering a large portion of the promised features because this architectural decision made them impossible). That company (mine, unfortunately) is most likely going out of business soon. So don't give me this crap that ego only adversely affects Open Source projects.

Re:It should all be configurable. by pwagland · 2001-10-30 02:46 · Score: 4, Interesting

Sadly, no.

While it is nice to be ultra configurable that leads to two seperate problems:
1) Code maintainability
2) User maintainability

1) Is a serious problem. If you have to test the impacts on two different VM systems, and fully understand the impact that any change will have is a mammoth task.

2) Users are not all technically literate anymore. Look at the recent slashdot story on microsoft losing there grip in Asia...

FreeBSD 4.4-STABLE vs 2.4 comparison? by swb · 2001-10-30 02:50 · Score: 2

He alludes to some FreeBSD vs. Linux benchmarks at the end of the peice. Anyone got any links?

Re:FreeBSD 4.4-STABLE vs 2.4 comparison? by ZerothAngel · 2001-10-30 03:13 · Score: 2, Interesting

The old benchmark is here, but as the poster above noted, the new benchmark is forthcoming.
Although it will be comparing a moving target (Linux 2.4.x) to a moving target (FreeBSD 4.x), the results will be interesting. AFAIK, there weren't any major changes (I mean like VM changes :) in FreeBSD, so comparing the old and new benchmarks would give a good indication on how much Arcangeli's VM improves things.
Re:FreeBSD 4.4-STABLE vs 2.4 comparison? by eparusel · 2001-10-30 04:16 · Score: 2, Interesting

Hmm, recently OpenBSD and FreeBSD (I'm not sure about NetBSD though) have added improved dirpref code (created by an OpenBSD developer(s)).

When data is written with the new algorithm, subsequent reads and writes are on average faster (being conservative). People are seeing 6x improvements for certain tasks as well!

So while there weren't any major changes to the VM in FreeBSD AFAIK as well, if the benchmark involves using any files on the disk, then it'll most likely be sped up...!

Here's a link to the discussion on the FreeBSD-stable mailing list...

and another link...

AC kernels are not a fork by rakarnik · 2001-10-30 02:56 · Score: 5, Interesting

Moshe Bar seems to indicate that Alan Cox is creating some kind of fork of the Linux kernel. Actually, -ac kernels are alwasys different from Linux kernels to some extent, since they include slightly more experimental code (e.g. ext3), or code that Linus has not had a chance to review yet. This way, the experimental code gets more testing before going into official Linus kernels. You can read more about -ac kernels at KernelNewbies.Org.

As anyone following LKML knows, Alan thinks that drastic VM changes should be reserved for 2.5, and so continues to keep Rik's VM going. This actually helps quite a bit as both VMs get tested and there have been several comparative tests conducted leading to improvements in both VMs. Competition in this case is certainly helping Linux.

Oh and for all you fork conspirators, here's another fact: Andrea Arcangeli also releases his own kernel releases, called -aa. I don't think any of these are considered forks; everyone understands that this way pacthes get more testing, "crosstalk" between the different flavors is a given.

Much ado about nothing, IMHO...

-Rahul

--
Genebrew

Re:AC kernels are not a fork by Webmonger · 2001-10-30 03:05 · Score: 2

The difference between the aa kernel and the ac kernel is that Alan Cox is widely recognized as the number 2 Linux guy, but he's promoting something completely opposed to Linus' decision.

If anyone could start a fork, it's Alan. However, remember that forks aren't necessarily bad. And there seems to be strong argument in favour of both VMs. . .
Re:AC kernels are not a fork by bwt · 2001-10-30 03:51 · Score: 5, Interesting

I don't think any of these are considered forks; everyone understands that this way pacthes get more testing, "crosstalk" between the different flavors is a given.

Well, I disagree -- they are ALL forks. Any time you create a patch you are forking. The open source development model relies on perpetual fork and merge to accomplish its development. Most projects are forked this way into a development and a stable branch. I call this a "constructive fork". The AC kernels are perpetually different, but importantly they are generally about the same "distance" away, and "crosstalk" as you call it keeps it that way.

As the "distance" increases, tension increases, and if it isn't resolved it will divide the development camp. If the crosstalk stops, and the idea of eventual merge is abandoned, you have a "true" fork. Developers have to pick sides, and the split can become permanent.

I think the AC kernels have always been the former kind of constructive fork. If he never adopts the new VM, then his kernel will begins to diverge since developing for two VM's is hard. In this way, a small perterbation can become a full blown deviation that divides developer resources. I really doubt that the VM issue will divide the linux kernel team permanently. As AC's kernel gets farther away from the main line, the tension on everyone will increase. Eventually, I predict, the team will force one solution, but there is no guarantee.
Re:AC kernels are not a fork by Ed+Avis · 2001-10-30 06:20 · Score: 2

I believe that Alan Cox prefers GNOME to KDE. Therefore I have a cunning plan for an Alan Cox Trap. I will set up a machine with a specially patched kernel to use Rik's VM when GNOME runs, and Andrea's VM when running KDE. Mwhahaha....

--
-- Ed Avis ed@membled.com
Re:AC kernels are not a fork by Tumbleweed · 2001-10-30 07:23 · Score: 4, Insightful

Actually, I'd say they're more like 'sporks' than 'forks'. Nobody who makes them intends for them to take over from the main Linus kernel tree.

Re:OSS Power by BenHmm · 2001-10-30 02:57 · Score: 3, Informative

Sure they could - provided all of the users of XP were the sort of people who don't mind downloading and recompiling a new kernel every two weeks.

They're not. So Microsoft put these changes in point releases instead.

Why does the ac tree persist? by Malc · 2001-10-30 02:58 · Score: 2

The article seems to come out in favour of the new VM code. It makes it sound like it works much more effectively. So, why does Alan Cox continue with the old VM code? There must be some reason why he thinks it's better, or why go through the effort of continually patching the old code into the newer kernel?

Re:Why does the ac tree persist? by tubby · 2001-10-30 03:32 · Score: 3, Interesting

The article seems to come out in favour of the new VM code. It makes it sound like it works much more effectively. So, why does Alan Cox continue with the old VM code? There must be some reason why he thinks it's better, or why go through the effort of continually patching the old code into the newer kernel?

Basicly because nether of them are good in all conditions. Each of them is better than the
other in some situations. eg, big systems, little systems or whatever. While i am on the kernel mailing list i haven't been following the discussions closely enough to say any more than that, but it's the gist of it. Also for a while Alan continuing to run the Rik VM gave people a way to run a later version kernel without being lab rats for the new VM, which really hadn't had much testing in 2.4.10/11.

I think that this article overrates the AA VM by a large margin. It cant really be said to have solved the linux VM woes, which is what it implies.

I have now used both of the .13 kernels and personally found the -ac vm to be better for my needs. On the other hand, since i brought 768MB of RAM today, my needs have just changed.
Re:Why does the ac tree persist? by osiris · 2001-10-30 03:33 · Score: 2, Insightful

i think it is mainly to do with stability and the proven ability of the old vm code. basically, the new vm code was a complete rewrite from scratch and was incorporated into the main kernel straight away. the problem therein is the fact that the code will not have been as throughly tested and proven as the old vm. it may well be that the new vm is rock solid but it hasnt been in use for as long as the old vm to prove it. what alan is doing is sticking with the old vm as it is pretty much proven to work well and not fall over.

its not exactly trivial to rewrite an entire vm so there are bound to be problems with it. these problems come out through testing. i would have thought such a major rewrite would have been put in a development kernel first rather than into a "stable" kernel tree. that way, developers can test it first and iron out any problems rather than everyone upgrading to the new vm _then_ a major problem found.

the new vm may be brilliant and fast, but alan has a point in sticking with the old code. major rewrites should belong in development trees until fully ready for a stable release.

But it is in the 2.4.10 linus series by wiredog · 2001-10-30 02:59 · Score: 2

Which is what he's talking about.

--

Best Slashdot Co

Re:But it is in the 2.4.10 linus series by LinuxGeek8 · 2001-10-30 03:06 · Score: 2, Interesting

Well, I was actually saying that if you compare 2.4.10 with 2.4.9, you're actually comparing 2.4.10 and 2.4.5.

Even though the kernel had gradually evolved from 2.4.0 to 2.4.9, it was evident that the VM design was more of a liability than an advantage.

Point is, the kernel did not gradually evolve to 2.4.9, but only to 2.4.5.
Rik's VM has problems, but in the current ac tree it is doing quite well. Maybe as well or better then Andrea's VM.

Anyway, let's hope that the best VM wins, if there is a best VM.

--
Well, don't worry about that. We can get you back before you leave. (Dr. Who)
Re:But it is in the 2.4.10 linus series by Cramer · 2001-10-30 16:03 · Score: 2

I've not done any tests lately, however, the way things were in the past (measured in years), the linux kernel sucked out loud without some swap. 1k or 100M, it didn't matter; simplying having swap available significantly changed the behavior of the system.

Surprisingly, the tivo, based on 2.1.24, performs better with swap disabled. However, that kernel is heavily modified and, well, ancient.

Re:Make it a build option by iamsure · 2001-10-30 02:59 · Score: 5, Interesting

This was actually answered on the list, and summarized in a Kernel Traffic. As Alan Cox put it "It would be horribly difficult".

While it sounds simple enough, as they said in the KT, the "replacement" of the VM was no small feat. It took 170 patches, which touched a very large percentage of the kernel.

Imagine doing so TWICE (or more) and trying to code 'around' the issues for each.

No.

This way madness lies. While it is a nice idea, the simple truth is that it doesnt belong in 2.4.

2.5 should have branched the second that the patches were considered. Linus didnt want to deal with bitching about 2.4 not being "good enough" and was impatient.

So be it. The differences between Linus' and Alan's kernel trees (other than the VM) is growing VERY small this week, and will probably be 'close-enough' for a handoff within the next two weeks.

The only question is which VM will end up in the 2.4 series. (NOT when Linus hands it over, but when Alan begins his releases of it).

I would not be shocked to see Alan disagree with Linus, and stay with the 2.4.x (x10) VM, and I also wouldnt be shocked to see him agree with Linus and use the new VM.

As to the patch on install idea, it is actually also discussed for kbuild in the 2.5 series.

2.5 will be very excited, if we can only get Linus to get working on it, instead of muddying the stable-series water!!

--
GPL'd web-based tradewars themed space game

Re:It should all be configurable. by Flower · 2001-10-30 03:00 · Score: 4, Interesting

While the idea is interesting I don't think it is practical. From what I've read on KT and in this article changing the VM forces design considerations on userland programs. It's additional complexity that most developers (and especially companies like Oracle) wouldn't appreciate. I also think it would raise support costs. At the very least I'd want some variable in /etc that would clearly state which VM was being used. For me at least, the issue is simplicity in favor of flexibility

I think the biggest bone of contention in the community is Linus replaced the VM in the current stable version instead of pushing it into 2.5. Again, not being a kernel hacker and only going from everything I've read, this was a radical change. I'd almost be willing to say the latest kernels should be labeled 2.6 but that's just me.

Oh, and finally, to paraphrase an old saying, give any tech-savvy user enough rope and they will hang themselves.

At least that's what I think. :)

--
I don't want knowledge. I want certainty. - Law, David Bowie

Alan will be switching VMs soon... by rakarnik · 2001-10-30 03:02 · Score: 5, Informative

See this posting to LKML:

Alans talking about switching VMs in -ac kernels

--
Genebrew

swap space? by archen · 2001-10-30 03:02 · Score: 3, Interesting

From the article - " All earlier 2.4 kernels (since 2.3.12) needed at least the same amount of RAM in swap and then more to give you additional virtual memory. This meant that on an 8-GB server, you needed to put aside almost a full 9-GB disk just to be able to swap"

Is this accurate? For just about everything I've always gone with 512Mb of swap, regardless of whether I had more or less RAM (not that I'm technically proficient or anything). This would also be a shortcoming of Linux since it would make it a pain in the ass upgrading RAM if you needed to allocate more swap space somewhere else each time. Well I'm all for the newer VM. Simple is good.

Re:swap space? by iabervon · 2001-10-30 07:08 · Score: 2

In the traditional VM formulation (pre-linux), every bit of VM would have a place in swap, and RAM would just keep a fast copy of the data. This greatly simplified the implementation, of course, because all of the data had a location on disk it could keep for its entire lifetime. Linux didn't do this: data would just be put on disk somewhere free, and could lose its place while in RAM. This meant that your total VM would be disk+RAM, not just disk.

As an optimization, and due to hard drive space being cheap, the first 2.4 VM used the traditional scheme, because, if a page hasn't been modified since it was last swapped out, it wouldn't have to be written at all if it was still there.

In any case, it's probably worthwhile to have at least 1.5 times the swap as RAM; if you have just a little bit of swap on a high-memory system, it's unlikely to save you from running out of memory, and will instead cause the machine to swap a lot before running out of memory anyway. You don't need to upgrade memory and swap at the same time, but you might as well upgrade swap first, or just turn it off. (That is, if you're getting more memory so you'll have more space, buy swap first. If you're getting more memory so it will be faster, replace swap with RAM)

These days, hard drives are cheap, and there are old hard drives lying around of reasonable sizes; just use a whole recently-replaced hard drive as swap. This avoids contention with filesystems and is easy to replace.
Re:swap space? by Andrewkov · 2001-10-30 07:08 · Score: 2

Actually one thing I've thought of but never tried is to make several small swap partitions, maybe one per drive (I don't think it would make sense to have multiple swap partitions on one drive). In theory, synchronized reads from the different drives should speed the system up. Anyone know if this is true?
Re:swap space? by h2odragon · 2001-10-30 08:03 · Score: 2, Informative

This is true.

There was a comment on LKML not all that long ago that dealt with the detials; IIRC it was in response to someone wondering about using software RAID0 for swap.

Re:It should all be configurable. by jacobito · 2001-10-30 03:05 · Score: 4, Informative

That's not going to happen in the 2.4 series. The kernel hackers think that making the VM policy configurable would be a nightmare:

Michael T. Babcock asked how ugly it would be to make Rik van Riel's and Andrea Arcangeli's Virtual Memory subsystem code into a compile-time option, so folks could try each one out as they pleased. Alan Cox replied simply, "Too ugly for words." Mike Fedyk suggested that it might be feasible in 2.5, and asked if there were a way to make it non-ugly. Marcelo Tosatti replied, "Even if its non-ugly, its non-easy. Way too much overhead. For 2.5 we'll probably be able to get people working together."

This is from Kernel Traffic #139.

Re:Make it a build option by sql*kitten · 2001-10-30 03:11 · Score: 3, Interesting

Do you think that Windows 2000 DataCenter has the same VM system as Windows 2000 Professional? I severely doubt it

It's actually probably the same algorithm, with different parameters. That's how NT4 did it, in Workstation and Server versions. The kernel would note which version it was supposed to be on startup, then initialize the VM system differently.

Against the Truth by Anonymous Coward · 2001-10-30 03:18 · Score: 2, Insightful

Moshe Bar argues two points I vehemently disagree with:

(1) Alan made a mistake in not switching to Andrea's VM. Alan is trying to maintain a stable kernel. Switching out large chunks of the VM is the last thing to do to achieve those goals. Alan will switch in due time.

(2) The preemptible kernel is unfit for certain scenarios. Everyone I know loves the preemptible kernel. It gets good reports on lkml and the kernel news sites - Hell, it even got good comments here!

I realize this is an editorial, and I understand everyone has an opinion, but if it isn't true it isn't true. An opinion can't contradict fact.

Tim

Re:Against the Truth by Flower · 2001-10-30 03:59 · Score: 3

I'll skip point 1. I agree with your assessment and as others have pointed out the switch will be made.

On point 2 however, I just don't agree with you. Moshe does more than a adequate job of explaining his stance on this issue. Between pointinging out the costs of making the kernel fully preemptible, citing his experiences with using it on personal machines (good) and servers (not so good), then noting the preemtible kernel breaks Mosix and LIDS I think he's got a right to his opinion. It's based upon at least as much fact as stating everyone loves the preemptible kernel.

--
I don't want knowledge. I want certainty. - Law, David Bowie
Re:Against the Truth by SurfsUp · 2001-10-30 14:55 · Score: 2

Moshe does more than a adequate job of explaining his stance on this issue. Between pointinging out the costs of making the kernel fully preemptible, citing his experiences with using it on personal machines (good) and servers (not so good), then noting the preemtible kernel breaks Mosix and LIDS I think he's got a right to his opinion. It's based upon at least as much fact as stating everyone loves the preemptible kernel.

His analysis is way wide of the mark. His main argument is that kernel preemption causes more context switches, and that's pure BS. Yes, there may be a few more quantum-expiry switches but these are infrequent enough as to be very difficult to notice. He misses that fact that for some loads, a preemption makes tasks run faster because a user task can continue sooner, following a completed disk read. He also misses the true cost of the preemption patch in that spinlocks are slightly more expensive.

The benefits of preemption in terms of reduction in latency on the other hand are large and measurable. For desktop use it's no contest, and I see non-preemptible kernel mainly being an option for certain types of servers carrying a kind of load that exposes the slight extra cost of the spinlocks, and a percent or two extra throughput somehow matters.

--
Life's a bitch but somebody's gotta do it.

Should be a compile-time option, then by Jeppe+Salvesen · 2001-10-30 03:22 · Score: 2

If they are to truly compete, then we should be allowed to choose between the Andrea VM code and the Rik VM code when we compile our beloved kernels.

However, a kernel fork would not neccessarily(sp?) be a bad thing, as long as the forking doesn't break the ability to run binaries. I'd hate to have to recompile my entire system just switch between VM-s.

--

Stop the brainwash

Re:Should be a compile-time option, then by be-fan · 2001-10-30 12:14 · Score: 2

Dude, a VM is an awefully low level bit of code to make a compile option! XFS, for example, won't work the the AC kernels due to the different VM. Probably, the two seperate kernels are the best idea ATM.

--
A deep unwavering belief is a sure sign you're missing something...

Re:Make it a build option by felicity · 2001-10-30 03:25 · Score: 2, Interesting

I agree with most folks that this should have waited for 2.5, 2.4 should only be bug fixes at this point. That's why it's the *stable* kernel tree. Big huge changes (and replacing the VM system is defintately in this category) are not appropriate here.

I wonder what Rik has to say about the new "blessed" VM? If he thinks it's a better all-around VM, then the debate can stop pretty quickly I would think.

I think it'll be interesting when the handoff occurs. Will people have to deal with different VMs constantly during official releases? 2.4.0-2.4.9, a change in 2.4.10-2.4.13, and a change back for 2.4.14 and beyond?

I also wonder which way major distros will go (since most people don't deviate from those kernels.) RedHat, for instance, usually bases themselves on the AC kernel tree (surprise) and then additionally patch it a whole lot more. While others take the most recent blessed kernel and go with it straight. Should be interesting.

My overall view of this is simple though: Linus is God, in relation to the kernel, until he says otherwise. (to paraphrase Eric Raymond) If someone wants to maintain a patch against the now-blessed VM to revert to the previous behavior, fine. The decision has been made for the new VM though, let's continue on with things shall we?

ok, here's the thing by Velex · 2001-10-30 03:26 · Score: 5, Interesting

I don't care if you want to swear by the Linus kernel, but it gets killed by IO. I mean, come on, I'm using 2.4.12, and I can't rip a CD an play an MP3. Under the AC series, I can rip CDs, play MP3s, watch divx movies, surf the web, untar a file, and have a compile job going at the same time. Even for more usual setups, like viewing a video without doing anything else, the Linus kernel drops frames left and right, whereas the AC series laughs at it. Don't tell me I need to use mplayer with SDL, because I do.

Because I treat my Linux box as though it were a Windows box (one of the reason I switched over to Linux for everything is that the widgets in GTK are prettier than the widgets in Windows -- it's nice to have people ask me how to get their desktops to look like mine and tell them they have to install linux) and I expect it run at least as well as a Windows machine, I must use the AC series. While I'm sure that the Linus kernel has it's applications, it is simply unacceptable for replacing the Windows kernel.

Mod me flamebait or troll if you want, but I speak the truth. I have a Thunderbird-750 with 224 MB of ram, and I find it simply unacceptable when I can't run Quake or view movies under linux because of the Linus kernel. When mp3s skip because I'm moving some data around, it tells me that something is wrong with the Linus kernel. I'm glad that I had a friend who introduced me to the AC series, or I would have given up on linux. Plain and simple, politics aside, the end user doesn't care that he's being loyal to Linus the Great, he just cares that he can view that movie. If Windows outperforms linux in multimedia, he'll use Windows.

--
Join the Slashcott! Stay away entirely Feb 10 thru Feb 17! Close all tabs to prevent autorefresh!

Re:ok, here's the thing by choward · 2001-10-30 06:28 · Score: 5, Interesting

I use the Linus stock kernel on a _very_ similar setup (Duron 700, 384MB ram) and I don't have the problem you mention. One thing I've noticed is that with the Linus kernel, DMA is _never_ turned on by default, you must use hdparm explicitly at startup. Once you do that, skipping mp3s are a thing of the past.

Running the hdparm tests,
w/out DMA: 4.01MB/sec
with DMA: 34.96MB/sec

Quite a change.

Craig Howard

--
-- Craig Howard
Re:ok, here's the thing by derF024 · 2001-10-30 06:50 · Score: 2, Informative

i don't know what you're doing to that poor machine, but i have a 366 mhz laptop with 128 megs of ram and i can do all of those things under linus' 2.4.12 just fine. I play video under avifile instead of mplayer, but I never lose a frame, even across 10 mbit ethernet.
Re:ok, here's the thing by ryanvm · 2001-10-30 09:22 · Score: 2

Hmmm - an AMD eh? Would you by chance be using a VIA chipset?

I'll have to admit that I don't keep up on the LKML like I should, but I've been afraid to use DMA on 2.4 because of the file system corruption it was experiencing with VIA chipsets early on. [I have a PIII with a VIA Apollo Pro 133A chipset.] Wasn't that the reason that Linus decided to default to "DMA disabled".

Did that ever get resolved or should I still be wary of DMA?
Re:ok, here's the thing by Rick+the+Red · 2001-10-30 12:51 · Score: 2

OK, here's the thing: I'm not really competent to comment on kernel internals. However, I am competent to ask questions, and I fail to see that any of your complaints are directly attributable to the VM code. Couldn't some of what you're experiencing be due to other code in the AC series, not the VM code? I'd really be interested to hear what about the Linus kernel's VM code is affecting your performance, especially since you have pleanty of RAM and the article clearly states the performance differences between the Linux kernel VM and the AC kernel VM appear in memory-limited situations. Am I missing something obvious, or did you miss something subtle?

--
If all this should have a reason, we would be the last to know.

Re:OSS Power by Xzzy · 2001-10-30 03:35 · Score: 5, Interesting

> so Linus used Arcangeli's new VM code. Problem solved. Stable as ever.

This is actually a wad of baloney. In normal applications (ie, running xmms, reading slashdot and maybe running gimp, with your glitzy desktop of choice), sure the VM works fine.

In any SERIOUS situation though, 2.4 simply falls apart crying because the kernal handles memory so badly. One would like to think that in a low memory situation the kernel would start hacking off whatever was causing the problem so that it could survive. Well, it doesn't. It just freezes. This has been a situation I've been forced to deal with over the past month.. so while I'm not a guru on the subject, I have pieced together some bits of the story.

Basically at my job we have a programming group that has mountains and mountains of source that they have to compile. Lazy as programmers tend to be, they also try to compile it over nfs on the machines with the biggest specs. To give a sense of scope, the resulting executable clocks in at around 500 megs. So basically, their build really stresses out the machine they're compiling on.

The machine freezes EVERY time because of memory shortages. The kernel can't allocate pages for incoming network traffic, causing a backlog, causing processes to hang, causing further backlog.. then powie an unresponsive machine. The obvious solution would be to slim down the build but if anyone's ever worked with a developer suggesting that would be as useful as suggesting Hitler was a saint.

From what I've gathered of the story, the 2.4 kernel was supposed to have this new grand VM that made dorking with the freepages file obsolete.. to the point where you can't even tweak the kernel with the freepages file anymore. The kernel was supposed to have this feature that would let it detect what processes were stealing all the memory and kill them off.

NEWS FLASH they took this feature out because it was buggy.

So what happens? The kernel just paints itself into a corner until the machine freezes. Only way to recover is to power cycle. This is why damn near every patch in the 2.4 line has the line "VM tweaks" in the changelog. Quite frankly the 2.4 VM is garbage, and only functions suitably well in non-intensive applications.

It's been getting better with each dot release but it's still nothing you'd want to bet money on.

Virtual Memory stinks in Linux by Apreche · 2001-10-30 03:37 · Score: 2

I have a large ext3 partition to store all my data and a 256MB partition of swap. I also have 384MB of RAM. Occasionally I'll hear the had drive grinding away like it's using the swap. I check and it is using the swap, but my real RAM isn't full yet. It's actually far from full, like 100MB free. I know if I don't have a swap partition it wont use it, but then I'll run out of memory sometimes, like when I have a huge pile of applications open at once. It really needs some work. I don't care about forks or anything, just make it work better.

--
The GeekNights podcast is going strong. Listen!

question:malloc support? by CaptnMArk · 2001-10-30 03:40 · Score: 2, Interesting

Does it support malloc correctly now (returning NULL when out of memory)?

Re:question:malloc support? by spitzak · 2001-10-30 05:08 · Score: 2

IMHO this is useless, it wastes a lot of memory and makes programs fail sooner (pages that are never written to or are immediately freed due to exec would take up space). More importantly this solves nothing, as memory does not necessarily run out when *your* process calls malloc, it runs out when *some* process calls malloc. Maybe in the old days when system calls did not require memory to be allocated this scheme would have worked but not anymore. I have no idea what a real solution is, but I kind of suspect nobody does...

Re:It should all be configurable. by Anton+Anatopopov · 2001-10-30 03:58 · Score: 2, Interesting

The design of a userspace program should not matter on what virtual memory system is in use.

Sure locality of reference matters, but any decent VM design will take this into account.

What I would like to see would be per-process VM algorithms. Like, you give an extra argument to the fork system call, and your new process has its virtual memory managed in a way which is optimal for your application.

Stack based languages exhibit different behaviour to certain other types of languages, and most VM systems seem to be optimised for this general case.

Re:Arguing over the best VM by Hiro+Antagonist · 2001-10-30 03:58 · Score: 2, Insightful

Not really; it's hard to quantify the benefits of a text editor; and although one may make a statement such as, "Editor foo enables me to edit files 50% faster," it doesn't mean much; because the reference is subjective to personal preferences.

With VM in the kernel, it's pretty obvious when things aren't working as well as they should -- mainly because it is possible to write test programs (scaffolding) that check to see if a designed system is performing to specifications -- read _The_Mythical_Man-Month_ for an excellent explanation of the merits of specification-based "scaffold" testing.

--

--
I Hit the Karma Cap, and All I Got Was This Lousy .sig.

Re:It should all be configurable. by DarkMan · 2001-10-30 04:01 · Score: 3, Interesting

Um, I think all the replies here I've read have missed the point. I don't think the poster was asking to be able to switch between the two VM's at complie time, but rather having one VM that was configurable.

That would allow the system to be tuned at compile time for the large servers, and for the small desktops, without haveing to have a 'one size fit's all' solution.

I've always felt that that would be the best answear. The reality is, however, that Andrea's VM would not allow for such a range of configurability, being a very simple, and thus easy to balance, system. That's not to put it down, often the simplest solution is best.

However, Rick's VM is more complex, and can, in principle be made more configurable at compile (or even run) time. It would be a lot of work, but I think that that's the best way to get good performance across the wide range of platforms.

For example, If I knew that my system would have to work with millions of very small files, and only read them once, then I would configure the VM to forget about caching the files, and keep anything that is used more than once in RAM. Or, of dealing with a computation, have large pages RAM to be swapped in or out that match with the arrays the computation uses, so that everything is pre-fetched. Yes, there are other ways of accomplishing these goals, but I think that that would be a good way to go.

If nothing else, it acknowledges that a system with 32 Meg of ram and one processor has a very different VM needs from an Octuple processor system with 32 Gig of ram.

Compound errors by Salamander · 2001-10-30 04:05 · Score: 5, Interesting

IMO both Rik's code (RVM) and Andrea's (AVM) were accepted prematurely, and Linus's ADD is the root of the problem here. Everyone thought the 2.2 VM was broken, so he jumped on RVM when it really hadn't received adequate testing with various workloads. Then, when that didn't work out, he did something even worse by jumping on AVM in the middle of a "stable" kernel series when it was totally undocumented and even less thoroughly tested than RVM. That's just bad software engineering, regardless of the quality of Rik's or Andrea's work.

Ideally, an "old-fashioned" alternative to RVM would have been maintained throughout the 2.3 process, as a fallback in case RVM turned out not to be ready for 2.4 - which was in fact the case. But this wasn't done, there was no alternative, and so RVM became the basis for 2.4. Once that decision was made it should not have been unmade by replacing RVM with AVM. Andrea's work should have been in the 2.5 tree, which should have been opened a long time ago to deal with precisely this sort of situation. 2.4 is not the last Linux kernel that will ever exist. We don't need to make it perfect. It would be far better to admit its imperfections, band-aid them as best we can, and try to get a head start on creating something better for 2.6. What we have instead is error on top of error, "not ready" replaced with "even less ready".

To clarify, I have nothing but the highest regard for both Rik's and Andrea's work. Obviously they have different ideas and attitudes. Rik has drawn on many sources in his design, resulting in a system that is both very advanced and very complicated. The process of reining in the complexity is still incomplete, but I still have hope that some day Rik will be able to come up with something that's really awesome, and he has always documented his ideas thoroughly. Andrea, by contrast, is much more pragmatic; he wants something that works now even if it's somewhat more limited in scope (e.g. by being almost impossible to reconcile with NUMA). The dark side of that "pragmatism" is that Andrea has skimped on non-code activities such as documenting or explaining the basic ideas on which his system is based. Nonetheless, both have done great work and should continue to do great work...in the 2.5 tree.

--
Slashdot - News for Herds. Stuff that Splatters.

Re:Compound errors by Milican · 2001-10-30 04:33 · Score: 3, Funny

Where is the moderation fairy? Why does she only give me dust when there are lame stories to moderate. Please sprinkle thy dust into my hands so that I may bless this post.

JOhn

--
Campaign for Liberty
Re:Compound errors by puetzk · 2001-10-30 06:36 · Score: 4, Informative

FWIW, I think that Andrea's setup is modeled after the 2.2 VM (which he did a fair amount of work on tuning). So this is really more of a pragmatic revive-the-old-approach than it might initially seem.

We all know this simplistic setup had scalability problems (like much of 2.2) but at least it worked right. Hopefully given some more time, Rik can really get his to go, since it seems more sophisticated/scalable long-term.

--
The Matrix is going down for reboot now! Stopping reality: OK. The system is halted.
Re:Compound errors by Salamander · 2001-10-30 06:51 · Score: 2

FWIW, I think that Andrea's setup is modeled after the 2.2 VM

Yes, mostly. Except for the classzone stuff.

We all know this simplistic setup had scalability problems (like much of 2.2) but at least it worked right. Hopefully given some more time, Rik can really get his to go, since it seems more sophisticated/scalable long-term.

Absolutely. I'd love to see Rik and Andrea (and others?) competing on purely technical merit, in the 2.5 tree. I think that would be great for everyone.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Compound errors by slashdot_commentator · 2001-10-30 09:12 · Score: 2, Insightful

Hey, Linus f***ed up with accepting the original RVM for 2.4. And now he was between a rock and a hard place. "Marketing" considerations meant that 2.4 needs to be "non-experimental"; that the client base could go to the 2.4 version, and use it with little concern that their server would crash. That would not have been the case with RVM versions up the last month.

So what was Linus to do? Keep dragging out 2.4 until RVM could fulfill minimum "marketing" requirements of stability? How long is that going to take? Do you want to wait and let M$ marketers talk about how amateurish Linux was; that professionals did not use Linux's current "stable" release, but a version that hasn't improved in 3 years?

So Linus decides to commit A REALLY BAD PRACTICE, and changes to a less tested VM over the initial 2.4 VM. Its another f**kup if AVM is buggier than RVM. (But Linus had reason to believe it wouldn't be so, with relatively limited testing.)

Is it still an f***kup even if AVM turns out to be more stable than RVM? If so, are you saying its preferable for new Linux development to be shutdown for another 6 months to a year? And Zdnet to opine on how the "stable" 2.4 kernel is DEMONSTRABLY unreliable? I'll take a "manager" that makes mistakes and makes decisions based on product survival over a manager that religiously follows an engineering practices manual.

--
There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
Re:Compound errors by Salamander · 2001-10-30 09:48 · Score: 3, Insightful

are you saying its preferable for new Linux development to be shutdown for another 6 months to a year?

No, there have been quite enough delays associated with 2.3/2.4 already. More than enough. And there will continue to be delays until the processes get ironed out.

What would have been preferable, IMO, would have been if more resources had been devoted to fixing and tuning the VM we already had (RVM, for good or ill). Linus could have put his foot down. He could have said "There will be no 2.4 VM except for RVM. The price for admission to the next round of VM redesign is that you help us fix RVM." People - notably Andrea - would have listened, and contributed more constructively. They know that Linus's good will is like currency. But Linus didn't say that. Alan Cox pretty much has, and kudos to him for having the courage to do so. What Linus did was take a bad situation and act in a way that nine times out of ten would make it worse. Maybe he'll get away with it this time because AVM in its current state is more robust than RVM in its current state, but that would actually be a bad thing because it will only reinforce the bad decision-making and we'll get burned next time instead of this time.

And Zdnet to opine on how the "stable" 2.4 kernel is DEMONSTRABLY unreliable?

First off, are good reviews from places like ZDnet the goal for Linux development? Second, do you think it's better for the stable 2.4 kernel to be subtly, unpredictably unreliable? Better the devil you know, and all that.

Most importantly, what if Linus's gamble - and that's what it was - hadn't succeeded? What would the ZDnet reviews be like then? What kind of ammo would that provide for everyone who wanted to claim that open-source development processes weren't all they're cracked up to be? Yeah, it looks (so far, knock wood) like we've been lucky this time, but I don't think relying on luck is a good thing.

I'll take a "manager" that makes mistakes and makes decisions based on product survival over a manager that religiously follows an engineering practices manual.

The two aren't as diametrically opposed as you make them out to be. Good engineering practices are good because they help increase either the speed or the reliability with which product can be delivered. Slavish adherence to any dogma is a bad thing, but so is the belief that everything you're doing is OK just because you managed to win one game of chicken. My point is that this scenario is going to be repeated. I'd rather encourage responsible driving than watch what happens when Linus plays one game of chicken too many and brings everyone else along for the ride.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Compound errors by Salamander · 2001-10-30 14:05 · Score: 2

There is no development kernel branch to test things out on right now

And whose fault is that? Who has unilaterally refused to open a 2.5 branch even though there clearly should have been one months ago? The lack of a 2.5 branch is a major piece of what's wrong, and it can be laid at exactly one person's clay feet.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Compound errors by Salamander · 2001-10-30 20:59 · Score: 2

Why do you think that Andrea did not make an honest effort to help fix RVM bugs?

Obviously the only authoritative answer would have to come from Andrea, but my guess can mostly be expressed in one word: ego. He thought his fundamental ideas were better than Rik's.

I think (and I can be dead wrong here) Andrea (as did at least Linus) took a good look at Rik's RVM, and saw 2.5 being started a year from now because of time needed to fix a "complex" RVM.

No matter how long it might have taken to fix RVM in 2.4, that's no excuse for 2.5 not being open. All of the flip-flopping should have taken place in 2.5, and if AVM turned out to be the "winner" in that environment it could have been back-ported to 2.4 if necessary.

Except if more (talented) developers can understand the unknown (AVM) devil, but not the better known devil.

That gets a little harder when AVM - and in particular its features that diverge from 2.2 - is almost totally undocumented. Many of the kernel developers have complained bitterly about this fact.

Was Linus better off standing pat?

Yes. The best and average cases were comparable, and the worst case for standing pat was better as it didn't involve calling the whole open-source development model into question. Perhaps even more importantly, taking a stand on principle would have set a good precedent and example for the future.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Compound errors by Salamander · 2001-10-31 03:30 · Score: 2
What evidence can you present that supports your conclusion that Andrea did not expend effort in debugging Rik's VM component?

I never claimed that, nor is it relevant, so I don't need such evidence. Andrea did in fact spend considerable time helping debug RVM problems, but obviously at some point gave up and decided to undermine that ongoing effort instead.

don't you think it would have been more difficult to determine the more stable VM with unstable code being tested at the same time?

Odd-second-number kernel series aren't so unstable that it would be impossible to distinguish VM-related problems from other problems.

I know you think its better to lose "by the book", but real world managers would grasp at the straw

"The book" is written the way it is because - in general - the odds of failure are lower when you do things by the book. Real managers who have a clue know that. "Grasping at straws" is bad practice. The folks at SEI will tell you that the primary hallmark of a mature software-development model is that it produces repeatable results (and, at the higher levels, information that can be used to improve the process further. Sometimes you have to pick the best of several bad alternatives, but that should still be done according to proven risk-management principles instead of "grasping at straws".

I don't believe this decision "calls the whole open-source development model into question".

Linux development is often held up as a model of how well open-source development can work. If AVM had proven to be less stable than RVM - and there was, at the time, practically no reason to believe otherwise - and the resulting instability in 2.4 were also associated with an obvious case of bad process management, that most certainly would reflect on open-source development models. I can see it now. "Open source development is fragile, it's vulnerable to the whims of untrained project leaders who make decisions based on politics rather than technical merit..." and so on. You can't seriously believe such polemics would not be written in such a situation, can you?

"Stands on principle" and "good precedents" sound rather ideological and they don't benefit you if you're "dead".

See previous about why the book is written the way it is; the whole point of the book is to reduce the chances of ending up "dead". Try to put yourself in Linus's shoes, at the time the decision was made. You have two choices:
1. You can keep going with a well documented and well tested VM system that has been making steady progress, even if some problems - including some fairly serious ones - still exist and some of its principles are well understood only by a few people.
2. You can accept a patch that implements a whole new VM system that's totally undocumented, that *nobody* except the author understands, and that's very lightly tested so you have no real idea whether it's stable.
Can you seriously suggest that #2 was the responsible choice, most likely to lead to success? I don't think so. Linus made the decision based almost entirely on his personal feelings about the people involved, a lapse of technical objectivity which any "real manager" knows often leads to disaster. Whether disaster actually occurs in the here-and-now is irrelevant; such behavior should be discouraged in the strongest terms by anyone who cares about the future of Linux.
--
Slashdot - News for Herds. Stuff that Splatters.

Re:ext3 in 2.4.x by Svenne · 2001-10-30 04:06 · Score: 2, Informative

You're looking for this.
I've been running ext3 since kernel 2.4.11 with these patches.

--

Slagborr

author's new benchmark by brer_rabbit · 2001-10-30 04:14 · Score: 2, Insightful

Upon returning home the other week after meeting with Andrea, I went to my lab and searched for the disk images of the server comparison I ran back in January of this year (of FreeBSD 4.1.1 versus Linux 2.4.0). I took the Compaq ML500 server I have been reviewing (2x 1-GHz CPUs, 2-GB RAM) and upgraded both the FreeBSD disk image to 4.4-Stable and the Linux version to 2.4.12.

Good, this would be an interesting benchmark.

Then, I changed the memory down to 192-MB RAM so as to stress the VM system more.

ok, this is fair, but you should also run with the same memory configuration you originally ran.

I also upgraded to the latest stable versions of Sendmail (8.12.1) and MySQL (version 3.23.42). Finally, I compiled everything with the latest version of gcc, 3.0.2, and tuned the two instances to the best of my knowledge (softupdates and increased maxusers for FreeBSD, and untouched default values for Linux).

NO!!!! why would you do this? Don't you want to know how the earlier linux/FreeBSD kernel compares to a later ones? Now instead of modifying one variable you've modified 3,846 variables. It's going to see if any improvements in FreeBSD/Linux are due to an updated kernel, compiler, mysql, etc etc. Go back to your original setup and only change the kernel, since I believe that's what you want to benchmark.

Bring yourself up-to-date by marm · 2001-10-30 04:39 · Score: 5, Interesting

The machine freezes EVERY time because of memory shortages. The kernel can't allocate pages for incoming network traffic, causing a backlog, causing processes to hang, causing further backlog.. then powie an unresponsive machine.

This was a common problem with kernels from about 2.4.1 up to 2.4.9 - the machine would gradually eat into swap further and further, failing to release no-longer-used swapspace, until it would go Out Of Memory (OOM) and attempt to kill the process that was eating all the memory. Frequently it would pick the wrong process to kill (sometimes even killing init) or would end up deadlocking.

I agree with you - that is no way for a virtual memory system to behave.

However, the Linux development process moves quickly once people get annoyed enough to actually do something about it, and that's precisely what has happened. Starting with 2.4.10, a new, simpler VM system has been used in the official Linus kernels, and I can say with some confidence that it has solved all the major problems with the 2.4 VM system, and continues to get significantly faster with every release.

If you haven't actually tried a new kernel yet (and from your problems it seems that you haven't), I suggest that you do - it's made the world of difference for me.

At the same time, the old 2.4 VM has lived on in the -ac series of kernels, and has become a great deal better there - some competition has made a big difference. Almost all of the major areas where it behaved badly have been fixed. However, my own impression is that it is still somewhat slower than the new VM.

The choice is yours which you want to run - my own recommendation would be for the new VM in the official Linus kernels, but others may disagree.

[OOM Killer]
NEWS FLASH they took this feature out because it was buggy.

Umm, no they didn't - it continues to exist in both the new VM in 2.4.13 and the old VM in the most recent 2.4.13-ac kernels. It does, however, now work correctly in both VMs. There are some philosphical arguments over whether killing processes is the best way of handling an Out Of Memory situation, but it is surely better than deadlocking the box, which is what most VM systems (including the famed FreeBSD's) do when OOM occurs.

It's been getting better with each dot release but it's still nothing you'd want to bet money on.

All I can say is that the new VM works great for me and lots of other people, even under extreme load. I can certainly understand your pain if you're using an older 2.4 kernel, but please try a recent one - the difference is astounding.

If you're still having problems with recent kernels, then I'm sure linux-kernel@vger.kernel.org would love to hear from you - and would certainly be a lot more useful to you than ranting on Slashdot. Getting the VM right is now priority number 1 for the kernel hackers.

Re:Bring yourself up-to-date by Laplace · 2001-10-30 04:59 · Score: 3, Interesting

I will preface this by saying that I am not a kernel developer.
Wouldn't it be possible to label some processes as OOM immune? For example, init could have this flag and would never be killed by the OOM algoritmn. Similarly, users could designate some processes more important than others. For example, my PDE solver which is crunching away at data for my thesis could be immune, but X could die if I ran out of memory.
This whole situation has had an impact on my work. With all of the debate and argument flying around, I'm not sure which kernel to use, if I should upgrade, or if I should revert back to the 2.2 series. Oh well.

--
The middle mind speaks!
Re:Bring yourself up-to-date by Xzzy · 2001-10-30 05:34 · Score: 2

> If you haven't actually tried a new kernel yet
> (and from your problems it seems that you
> haven't), I suggest that you do - it's made the
> world of difference for me.

We've been trying new kernels as they arrive and we get the time; currently up to 2.4.10 on a brand spanking new build machine and it croaked the other day with the same symptoms. Testing new stuff takes time around here.. lots to do and sometimes trying the latest and greateset takes second priority.

> Umm, no they didn't - it continues to exist in
> both the new VM in 2.4.13

Stand corrected, like I said I'm not a guru on the subject I just read a lot of text trying to figure out what the issues were. ;)

> If you're still having problems with recent
> kernels, then I'm sure linux-kernel@vger.kernel.org
> would love to hear from you

If I had any new information to provide, I would. As it is, "the machine freezes" isn't too helpful and I'm the sort that keeps his mouth shut when other people are already saying pretty much the same thing. ;)

My main point in posting was to try and let people know that the current VM definetly "doesn't work", at least not as well as the original poster appeared to be claiming. :)
Re:Bring yourself up-to-date by pi_rules · 2001-10-30 06:35 · Score: 2

Not a kernel developer here... but I've got a -tiny- bit of education on the stuff.

Thing is, what kind of mechanism would you use to label stuff OOM immune? You'd certainly have to modify the PCB structure for each process, unless there's some unused bits in there already. If there -is- an unused bit you really wouldn't have any hit at process switch time because you were already copying that unused bit. If you -do- have to add one there's 32 more bits that need to be copied. I dunno how much of a hit this would cause really but I'd imagine keeping process switch time down to an absolute minimum is essential.

Now, you'd need another user-land tool to set these flags which would presumably be run as root only, and make a decision as to whether a process starts as OOM immune or not.

Then again, maybe OOM immune status could be determined simply by the 'nice' value of a process.

I'd imagine the code to do this would be fairly trivial, but the impact would be rather large. Man, thinking about stuff like this really makes me want to get down and dirty and start doing some -real- programming again.
Re:Bring yourself up-to-date by Salamander · 2001-10-30 20:20 · Score: 2

There are some philosphical arguments over whether killing processes is the best way of handling an Out Of Memory situation, but it is surely better than deadlocking the box, which is what most VM systems (including the famed FreeBSD's) do when OOM occurs.

Untrue. Simply untrue. Most VM systems will slow down, some gracefully and some not, but they will not deadlock.

It's also more than a philosophical issue, BTW; it's a very real-life practical issue. The whole idea of killing a process - any process - completely and irretrievably because the system is low on memory is simply brain-dead. Better solutions might involve working-set limits, emergency reserves (similar to what's already done for disk space in some filesystems), disallowing memory overcommit, long-term suspension of processes so their pages can be stolen, etc. - the possibilities are nearly endless. The OOM-killer approach always carries the possibility of leaving the system truly deadlocked as one member of a temporal-dependency chain - perhaps spanning multiple machines - gets shot in the head. You'll have plenty of memory, but it won't be doing you any good as you're deadlocked nonetheless. There is no problem that the OOM killer solves that is not solved better by a different approach.

--
Slashdot - News for Herds. Stuff that Splatters.

How can you get around the VM problems in Linux? by cbwsdot · 2001-10-30 04:51 · Score: 2, Interesting

Good article, but what can I do, the end user, to get around some memory management and scheduling issues? I don't know nearly enough to do any kernel tweaking and tailoring for my processor architecture and hardware configuration. Is it unwise to run a desktop system without swap space? I was thinking of giving the performance of my system a boost by leveraging the low RAM prices and eliminating the swap partition. Other /.'s and I have experienced swap space usage during times when some physical RAM was free. Is the kernel thinking "Well, these pages are *so* old and inactive enough that I'll just stick them in swap, regardless of the current system state"? It sounds like paragraphs 16 and 17 attempts to explain this but it flies over my head. Since you can buy 1GB of EEC PC133 RAM for about $150, I was thinking of buying 3GB of RAM, investing in some type of power backup and running my entire system in RAM. Can you do this?

Despite this major issue, a Linux based system is still more stable and in most cases faster than Windows 2000. Also, like the article mentions, take into account that Linux runs many different types of processors. Linux on SPARC is good and 21264 Alpha performance is mind-blowing. Keep up the good work.

Re:OSS Power by Laplace · 2001-10-30 05:03 · Score: 2

~500MB data sets are easy. I'm working on a real time project. Data streams in, I have to do some very specific transformations on it, analyze the transforms, and spit out the results. The amount of data is huge, and it comes in fast. Think real time scientific computing, where every point of data could contain valuable information (well, my software does try to throw away the unimportant stuff, so that part isn't so true).

--
The middle mind speaks!

Re:Make it a build option by Rik+van+Riel · 2001-10-30 05:14 · Score: 5, Informative

I wonder what Rik has to say about the new "blessed" VM? If he thinks it's a better all-around VM, then the debate can stop pretty quickly I would think.

Well, since you wanted to know ;)

First let me explain that most of the time in the beginning of 2.4 was spent making the VM stable, stopping it from chrashing on highmem machines, etc... Speed improvements were a secondary thing, to do later on. Secondly, Linus is a very busy man and didn't seem to have the time even to apply critical bugfixes at times, so his kernel has had a big disadvantage over Alan's kernel.

Around the time where the VM in Alan's kernel got stable, I was finally getting the time to work on speed improvements and Linus still lagged a few patches, suddenly Andrea surprised us all by posting the first version of his new VM online. An even bigger surprise was that Linus integrated this into the kernel within 24 hours, without even asking Andrea!

As to why Andrea's VM is faster for desktop use ... it was optimised for speed on low to medium loads in exactly the same way the 2.2 kernel was. Note that this also means the server falls over quicker under high load and it is basically impossible to tune the system to run decently under all loads ... just like 2.2.

My VM was slower for desktop loads, but since the thing stabilised I put in some time to make things faster and I seem to have mostly caught up with Andrea on the speed front now. The benchmark results posted on the linux-kernel mailing list seem to indicate that Andrea's VM is faster for some things, while my VM is faster for some other things.

Personally, I think it is easier to make a solid VM fast than it would be to make a fast VM solid. This opinion was formed because of the living hell of the Linux 2.2 VM, which was undocumented and horribly subtle.

In the future, I know I'll always be optimising for (1) maintainability, (2) correctness/stability and (3) performance, in that order...

Microsoft by fcd · 2001-10-30 05:33 · Score: 2, Insightful

It's certainly good to have competition to bring out the best in each system, but it would be horribly inefficient to keep it going in the long run.

Isn't that basically Microsofts argument as to why its ok for them to be a monoply? That competition is not efficient in the software industry?

Why VM is bad by Animats · 2001-10-30 05:36 · Score: 4, Interesting

Virtual memory is way overrated, and probably should be phased out, both on servers and desktops.

In Peter Denning's classic paper, The Working Set Model of Program Behavior, Denning concluded that paged virtual memory was, at best, good for an effective 2X increase in memory size. When he wrote that paper in 1968, memory cost about a million dollars a megabyte, so a 2X increase was worth the headaches of a VM system. Today, with memory at a few hundred dollars a gigabyte, it looks less attractive. It's not that expensive to double the size of RAM today. It can be cheaper than adding a fast disk drive just for paging. Uses less power, too.

Disk as backing store gets worse as RAM gets faster. When Denning wrote that paper, the fastest backing devices (drums) rotated at around 10,000 RPM, for a 6,000 microsecond access time, and core memory cycle times were around 4us. So main memory was 1,500 times faster than backing store. Today, RAM cycle times have dropped to around 0.020us, but disks still top out around 10,000 RPM, making main memory 300,000 times faster than backing store. Thus, the relative cost of a page fault has increased by a factor of 200. This makes VM far less attractive today than it used to be. It's not getting any better, either.

The price of having virtual memory is terrible performance once paging between active processes starts. That's called "thrashing". On a server which is processing short transactions, you're much better off throttling at the transaction launch point (as, for example, where CGI programs launch) than going into thrashing. This requires some coordination between applications and memory allocation, but where most of the memory is used by Apache and its child processes, that's a viable option.

The main value of VM today is getting rid of dead code at run-time. A basic problem with shared libraries is that you load in the whole library, needed or not, when you need any function from it. This wastes memory, but after a while, the VM system will notice the unused pages and quietly release them. On a larger scale, the same problem is seen with dormant applications, a problem which has gotten totally out of hand in the Windows world, where far too much unwanted stuff launches at startup. VM ejects them from memory. That's what VM is really used for today.

So if you're actually page-faulting, VM is hurting, not helping.

I'd argue that it's time to go back to a swapping model - all of an app has to be in before it runs. That's where UNIX started; virtual memory didn't come in until 4.1BSD. But in support of this, apps need more information about the current memory situation. And they should be able to designate parts of their space as pageable, at least at the shared object/DLL level. Only a few apps (web servers, window managers) need much memory awareness, so that's feasible. Throttling needs to occur at a smart place, just before allocating substantial resources, such as CGI process launch or connection opening. By the time the VM system becomes involved, it's too late; resources are already overcommitted.

The big win from this is repeatable latency at the memory level. With all the interest in reducing kernel latency at the CPU level, it's time to address it at the memory level too.

QNX, the real-time OS, is worth looking at in this regard.

Re:Why VM is bad by jovlinger · 2001-10-30 07:02 · Score: 2

but an unused shared library page will be dropped, not swapped (as it will be clean). In fact, it will likely never be read into memory at all, just mapped.

So it seems that you use the two concepts of swapping and mapping as if they were one, when they are not. It is perfectly feasible to have a system with VM but no swap. This system will of course die at unpredicable times because of [temporary] overcommit that you could have survived with a swap file.

I can't see how you could have a system with swap but no VM, tho. Arguably, this is what overlays and banks do for you.
Re:Why VM is bad by DaveWood · 2001-10-30 07:19 · Score: 4, Informative

Alright, I'll bite. What you say is interesting, and I believe your comments regarding the changing relative costs of traditional VM paging algorithms make sense. The problem is that I suppose I don't understand the alternative you are proposing. I am certain this is due to my own ignorance; please give tolerance to my questions, and don't let my inquisitiveness be mistaken for criticism.

You say, "The price of having virtual memory is terrible performance once paging between active processes starts." Assuming the VM algorithm is working correctly (big assumption lately), this means basically that you are trying to run more than your memory can handle, and have reached a load-shearing point with respect to RAM. From this I surmise that we might be talking about a "smarter" VM system that would shear better, perhaps by identifying the condition, and perhaps by better communication with higher levels - in other words, a different/better application-level interface to the VM system.

And, indeed you say, "On a server which is processing short transactions, you're much better off throttling at the transaction launch point [than thrashing]... This requires some coordination between applications and memory allocation." So I think I understand so far.

Then you say: "A basic problem with shared libraries is that you load in the whole library, needed or not, when you need any function from it." This is where I perhaps display my ignorance of the kernel, but that's not what I have understood was going on. My impression of things was that an application was loaded into memory by mapping its data on the disk into "virtual" memory, and that the VM subsystem arbitrated between real and virtual memory by retrieving from the disk only what blocks were "necessary" (i.e. being referenced by the executing code), and that this process naturally extended to libraries, and especially shared libraries (which need only exist in "real" memory in one location, despite being mapped into multiple "virtual" memory environments). Then again, perhaps it is a minor point - if the whole SO image is loaded and then unused pieces are unloaded or vice versa, it seems less important than the contention problem already on my mind...

You say "VM ejects [unused bits of libraries and applications] from memory. That's what VM is really used for today." Absolutely! But regardless of the relative differences, isn't this process of migrating data between different "tiers" of data storage in the computer (each with a different latency, throughput, and cost/availability) always going to be necessary? While I can certainly see a major advantage in creating/improving ways for the application to communicate with the memory management system, is there really some fundamental alternative to the block-based VM "guesswork" that takes place in absence of directives set at compile time?

You say: "So if you're actually page-faulting, VM is hurting, not helping." I am wondering if the VM is either hurting or helping per se, since the real problem is that you don't have enough RAM even for the "active" blocks you want to run. Of course, the quality of your VM will determine how close you can get to "perfect" utilization of your RAM.

Then you say, "I'd argue that it's time to go back to a swapping model - all of an app has to be in before it runs." This is where you lose me, I suspect because I do not understand what you are really proposing. You go on to say "in support of this, apps need more information about the current memory situation. And they should be able to designate parts of their space as pageable, at least at the shared object/DLL level. Only a few apps (web servers, window managers) need much memory awareness, so that's feasible.Throttling needs to occur at a smart place, just before allocating substantial resources, such as CGI process launch or connection opening. By the time the VM system becomes involved, it's too late; resources are already overcommitted."

At first it sounds as though you are saying that you want to eliminate swap altogether. I do not doubt that for some situations this is preferable - you want to have consistent performance and a sharp failure rather than the long thrash in the case where you use up your resources (and you mention QNX). However for general-purpose computing, I'm not so sure this is a good idea, even with RAM as cheap as it is. Depending on what you're trying to do, the slight loss in predictability and overall performance is vastly preferable to sharp failures for many, I would even say, "most" applications, even on the server.

But moving on, it seems you are saying that what you dislike about the VM is that data is broken into arbitrary blocks - and so we should rely on application programmers to designate what it would be a good idea to swap out in case of memory contention ("designat[ing] parts of their space as pageable"). The problem I see with this is that you are relying on the programmer to do something that, if they do not do it, their program will appear to run anyway.

This is therefore automatically classified a frivolous expense by commercial software developers, and even OS people working for the love of the game may be tempted into the same pitfall. This is superficially similar to the argument between malloc/free proponents and garbage collector advocates. Giving the programmer another "lower-level" thing to worry about gives them an opportunity to optimize it, but in practice we often find that on the balance we get more mistakes and the quality of the user experience suffers.

The compiler probably could be coaxed to do it for you. But the various tradeoffs between compile time "pre-blocking" and runtime blocking might leave compile-time computations, whether in the compiler or even in the developer's head, looking inferior to what a good VM system can do while observing actual behavior in real-time.

Your point about throttling occuring "at a smart place" is not lost - obviously many applications could benefit from more transparency by the memory management system in managing their affairs - apache users really don't want to have to guess how many processes/concurrent users should be allowed, they want apache to determine it for them based on what the system can handle. But most application programmers are not going to do this extra work or do it right, and a VM seems like what you need as a "default behavior," even if its benefits (and its audience - those who have enough RAM that they never need fear swap) are lessening over time.

--
We're on the road to Tycho.
Re:Why VM is bad by RelliK · 2001-10-30 07:46 · Score: 5, Insightful

huh? what? This is the most uninformed garbage I have ever read. I don't have time to refute all of the nonsense, so I'll just take on the biggies.
The price of having virtual memory is terrible performance once paging between active processes starts.
When that happens, you are running a lot more processes that can fit into memory. Without VM you would not be able to do that at all.
A basic problem with shared libraries is that you load in the whole library, needed or not, when you need any function from it.
False. Any decent VM does demand paging. Only the pages that are needed are loaded from the executable. The parts of the program that are never executed are never loaded from disk, notwithstanding read-ahead optimization. A shared library is just an extention of the executable so the same rules apply. Further, a shared library can be used by multiple processes and only *one* copy of it is loaded into memory.
I'd argue that it's time to go back to a swapping model - all of an app has to be in before it runs.
That would be absolutely stupid. It would slow down the system tremedously. Se above about demand paging.
Without VM, you would need to increase the memory requirements by a factor of N, where N is the number of processes running concurrently. Further, the startup time of each process would always be slower since all of the code would have to be read in memory. With VM part of it is already there (shared libraries), and the code is loaded on demand.
In short, this is the biggest pile of uninformed garbage. You *really* need to take an OS course before you can talk about OS design.

--
___
If you think big enough, you'll never have to do it.
Re:Why VM is bad by mickwd · 2001-10-30 07:59 · Score: 2

I don't agree with your point about removing virtual memory from the operating system, but I am surprised that more people using Linux in a server role just don't buy enough RAM to ensure that the minimum amount of swap is actually required.
Desktop use is a different matter - many people who use Linux on the desktop don't have a lot of money to spend on extra hardware (including RAM), and Linux should be able to run decently on as wide a range of hardware as possible (including cheap, old machines).
But for server use, if you can afford the price differential of SCSI over IDE (which you should, for any server supporting any serious load) you should also be able to afford enough RAM to deal with most loading situations.
Re:Why VM is bad by Johnno74 · 2001-10-30 08:28 · Score: 2, Insightful

No, sorry but I dissagree.

A clever VM system is more important than ever, when you combine it with an effective disk cache.

Yes, Ram is cheaper and faster than ever, but stopping your OS from using a swap file/partition is gonna stop your OS from efficiently using your ram.

Your machine should be allocating as much memory as possibe to a disk cache, even if this reduces the available memory to the extent that active processes start paging, because the swap file paging is optimised by the disk cache too.

It is usually better to swap out pages and keep your cache large than to keep rarely used pages in memory at the expense of cache, because even if you need those pages they will quite likely be in the cache.

Even if you have a shedload of memory (especially so!) you will get better use of your memory if you use some of it as a disk cache than if you don't page rarely-used pages to disk.
Re:Why VM is bad by puetzk · 2001-10-30 11:12 · Score: 2

This was a rule of thumb, because under most ordinary loads it was all the swap the system could ever make any reasonably use of. More RAM (for the same workload) means less swap needed - but if you're going for maximum peak worload you want as much VM space as possible. However, Anything much beyond 2xRAM makes no sense - the system will pretty much totally disintegrate if you've got that that much more data than RAM. Hence the recommendation - that's about as much swap as 2.2's vm could really make use of.

--
The Matrix is going down for reboot now! Stopping reality: OK. The system is halted.
Re:Why VM is bad by Snowfox · 2001-10-30 12:26 · Score: 2

The main value of VM today is getting rid of dead code at run-time. A basic problem with shared libraries is that you load in the whole library, needed or not, when you need any function from it. This wastes memory, but after a while, the VM system will notice the unused pages and quietly release them. On a larger scale, the same problem is seen with dormant applications, a problem which has gotten totally out of hand in the Windows world, where far too much unwanted stuff launches at startup. VM ejects them from memory. That's what VM is really used for today.
So if you're actually page-faulting, VM is hurting, not helping.

That one-liner made no sense on the tail of the previous paragraph.
Apps are getting larger. A fat chunk of most modern GUI software is dead code, error checking code, or code that only gets used during startup and teardown.
Apart from application code, handling huge datasets also benefits from VM, leaving the most often used parts in memory. Do you really want 2G of memory just to lay out a multi-layer magazine-resolution image in gimp?
That said, if you really want your wish - just set VM page sizes to be larger than your largest application and have a look-see at how performance is affected.
Re:Why VM is bad by be-fan · 2001-10-30 12:29 · Score: 2

QNX sucks as a desktop OS precisely because it lacks a decent VM (and a decent FS, but that's another can of worms...)
A VM doesn't make the system any slower. The cost of maintaining mapping data and such is negligable compared to the other startup costs associated with applications. Also, most VMs do very little until the system memory becomes stressed. FreeBSD, for example, won't really bother keeping page ages up to date unless there is a lack of available pages. As long as you're not swaping, a VM-based system is no slower than a non-VM system. By the time you're swaping, however, a VM system has a lot more active processes (since the working set of each process is smaller than the total image) than a non VM system. Also, a VM system can deal with temporary demands for a huge amount of memory (such as compiling, in which the disk is getting trashed anyway), while a non-VM system can't. In fact, the QNX developers had to hack GCC to use their swap extensions because it wouldn't run on any sensible RAM config without VM.
You're comments about RAM are also questionable. While RAM might be getting cheaper, the uses for RAM are increasing. The more RAM you add, the more uses you can find for it, such as rendering bigger images, etc. There is no point in wasting a significant amount of it when you don't have to.
Lastly, the comments about process swaping are completely off the wall. App developers are inherently lazy, which is why modern systems take as much responsibility away from them as possible. If you left it up to the application developers to manage memory, then you'd end up with gigantic (more so than now, if that's possible) runtime footprints.

--
A deep unwavering belief is a sure sign you're missing something...
Re:Why VM is bad by SurfsUp · 2001-10-30 14:16 · Score: 2

Thus, the relative cost of a page fault has increased by a factor of 200. This makes VM far less attractive today than it used to be. It's not getting any better, either.

You completely failed to think about disk caching and memory-mapped files, your view of VM is way too limited, and that's how you argued yourself to an incorrect conclusion.

--
Life's a bitch but somebody's gotta do it.
Re:Why VM is bad by Salamander · 2001-10-30 20:48 · Score: 2

The saddest thing here is that such garbage got modded up as "informative" instead of being modded down into oblivion for being "uninformed".

It's not that expensive to double the size of RAM today. It can be cheaper than adding a fast disk drive just for paging.

Even with RAM at US$0.25/MB or less, that's still untrue. A top of the line hard disk is under a penny per megabyte, which represents a pretty significant difference.

disks still top out around 10,000 RPM, making main memory 300,000 times faster than backing store

Where the hell do you get 300K from? Picking the first drive I see at an online store I see an average seek time of 4.9ms (RPM is totally irrelevant). 4.9ms/0.02ms = 24.5, not 300K. Yeah, yeah, you have to factor in transfer times as well, but 300K is still way out of line.

On a server which is processing short transactions, you're much better off throttling at the transaction launch point

Not all workloads are characterized as short, independent, stateless operations. Yes, there are applications out there that aren't webservers, and they deserve to be well supported by the OS as well.

The main value of VM today is getting rid of dead code at run-time.

Bullshit. But I see others have already addressed that.

I'd argue that it's time to go back to a swapping model - all of an app has to be in before it runs.

So, because the mismatch between RAM and disk access times is so great, it's better to move all of a process's address from RAM to disk and back again instead of just part? Riiiiight.

As someone else said, please take an OS class. While you're at it, read Hennessy and Patterson. You seem to be missing a lot of information about how computers actually work at both the hardware and software levels.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Why VM is bad by Salamander · 2001-10-31 01:37 · Score: 2

4.9ms/0.02ms = 24.5

Before anyone else points it out, that should be 245. I don't know how that period got in there; I guess it's something I should watch out for when I post at 3am. In any case, this just means that the previous poster was off by three orders of magnitude rather than four. Congratulations. :-P

--
Slashdot - News for Herds. Stuff that Splatters.

Linus's choice by RichiP · 2001-10-30 05:40 · Score: 2, Insightful

While I normally take issue with the way Linus bullies a kernel issue based on what he perceives is technical merit, I have to agree with him on this one. First, if a better way of doing things has been found, even though it's in the middle of a stable series, it should be changed in order not to propagate wrong coding. I've been coding for a long time now and I still believe that if an error, bug or better coding scheme is found, it be implemented as soon as possible.

The problem with leaving the change till the 2.5 series is that the 2.5 series is nowhere in sight and development kernels usually take more than a year to cycle through (no matter what the kernel hackers say). The fact that 2.5 hasn't even begun may be an indicator of how long 2.5 will take to finish.

it's been ac kernels by MenTaLguY · 2001-10-30 05:40 · Score: 3, Informative

RedHat ships an -ac kernel with RH 7.2, I think 7.1's was also an -ac kernel.

Not pure -ac kernel, though, like most major distributions they also pull stuff from Linus and other kernel trees (there are others) so what they actually ship is really the "RedHat" tree.

--

DNA just wants to be free...

unstable kernel, sigh.. by TheGratefulNet · 2001-10-30 05:46 · Score: 4, Insightful

as a 5+yr linux vet, I'm horrified at this turn of events. I've always counted on linux to be rock stable, yet the last few months have been anything BUT stable.

I really hate to say this, but I'm wondering if jumping ship to freebsd (etc) makes sense. I've been a major linux supporter for quite a long time, but I know that the *bsd guys have had their act together (good smp, good networking under load, etc) for a long time.

would it be all that crazy to adopt the VM system from the 'establishment' (bsd)? frequently the linux codebase DOES borrow from bsd. why is the VM system all that different?

--

--
"It is now safe to switch off your computer."

Re:unstable kernel, sigh.. by nathanh · 2001-10-30 08:17 · Score: 2

I really hate to say this, but I'm wondering if jumping ship to freebsd (etc) makes sense. I've been a major linux supporter for quite a long time, but I know that the *bsd guys have had their act together (good smp, good networking under load, etc) for a long time.

*BSD has poorer SMP than Linux. Networking is arguable (some benchmarks show Linux is faster, some don't).
*BSD has also had more than it's own fair share of infighting. Do people forget so quickly why we have *BSD instead of 386BSD?
Go use *BSD if you want to, but do so for the right reasons, not because of rumor and myth.
Re:unstable kernel, sigh.. by TheGratefulNet · 2001-10-30 08:21 · Score: 2

BSD has poorer SMP than Linux

not to start a flame war, but things like netbsd have had to be VERY smart about smp (and multiprocessing, in general) for a long time, simply due to the fact that they deal with much more hardware types than we (linux) do.

freebsd is known by most to be more stable at high network load. hell, its got some 15yrs or so on linux, it has to be more mature simply due to all the time-tested code.

and don't even get me started on nfs. nfs on *bsd has a hope of working. I'd not try nfs on linux unless I like hangs, stale handles and general unpleasantness.

--

--
"It is now safe to switch off your computer."
Re:unstable kernel, sigh.. by AntiBasic · 2001-10-30 08:37 · Score: 2

I've noticed a recent trend towards trashing FreeBSD's SMP because of "the giant spinlock." What people don't realize is that one large spinlock can be a viable method of locking for the purposes of threading (that is, multiprocessing). It would seem that someone who has a moderate clue about threading and writing SMP-capable operating systems has commented on this, and feels it's bogus, and one or more of the general breed of "BSD is ubersux" trolls has gotten a hold of this and thinks it's the ultimate death knell for FreeBSD/smp. Obviously, you don't really know much about locking at all. It should at least be pointed out that no matter how many locks you have, it is more important to keep the system OUT of a locked state as much as possible, and FreeBSD does this well enough. It's not as if the system is constantly locked and able to use only one CPU. Most processing occurs in userland, far away from kernel locks, so it doesn't tend to matter all that much. Now, granted, using one spinlock isn't necessarily the best way to do things, at least not in an OS. However, it's not the worst either. Combined with the fact that it allowed fairly rapid updating and deployment of FreeBSD/SMP, I think the choice to use that 'giant spinlock' was valid. It allowed SMP code that by all accounts worked better at least than the 2.0 Linux kernel's (if not 2.2 as well) to be deployed until a better solution could be created.
A better solution will be deployed in FreeBSD 5.0 with the introduction of SMPng. I do not doubt that the 2.4 Linux kernel does a better job at SMP than FreeBSD (release/stable) does, but I think it's worth noting that Linux's SMP has been now five or six years in the making to get to this point, and that the Linux and FreeBSD development and advancement models are significantly different. Where Linux takes gradual steps, FreeBSD (and BSDs in general) tend to take large leaps. That's just a difference in implementation timing. Furthermore, it's perfectly reasonable to expect two open-source systems to leapfrog each other in terms of capability as ideas and code move from one to the other, and it's really not something to gloat over. What one does better today, the other will do better tomorrow. It doesn't really matter. To those of you babbling on and on about 'the giant spinlock', you might want to go do some research into the theory, and practice, of implementing locks in threaded systems. Until then, shut up, please.
Re:unstable kernel, sigh.. by nathanh · 2001-10-30 12:42 · Score: 2

... netbsd have had to be VERY smart about smp (and multiprocessing, in general) for a long time ...

Read the NetBSD feature list by platform. Only 5 platforms have any NetBSD-support for SMP at all and not one of them is stable or usable. Four of the five have "spinup" support only.
http://www.netbsd.org/developers/features/

Myths and rumors.
Re:unstable kernel, sigh.. by nathanh · 2001-10-30 13:11 · Score: 2

I do not doubt that the 2.4 Linux kernel does a better job at SMP than FreeBSD (release/stable) does...

How is this different to my comment of "BSD has poorer SMP than Linux"?
Obviously, you don't really know much about locking at all.

I didn't make any comment about locking.
Most processing occurs in userland, far away from kernel locks, so it doesn't tend to matter all that much.

Really?
To those of you babbling on and on about 'the giant spinlock', you might want to go do some research into the theory, and practice, of implementing locks in threaded systems. Until then, shut up, please.

I suggest you practise what you preach.
Re:unstable kernel, sigh.. by SurfsUp · 2001-10-30 14:30 · Score: 2

would it be all that crazy to adopt the VM system from the 'establishment' (bsd)? frequently the linux codebase DOES borrow from bsd. why is the VM system all that different?

BSD doesn't support >4 Gig memory on IA32, for one thing. Yes, Linux's VM could be more similar to BSD's and it would probably be better that way, but there is no way it can be identical.

--
Life's a bitch but somebody's gotta do it.
Re:unstable kernel, sigh.. by nathanh · 2001-10-30 20:49 · Score: 2

Uh, 386bsd developed into Free because it was too large to just be a patch.

The name change to FreeBSD was because Bill Jolitz refused to handover the project and would not apply patches. The three developers of Free were going to call it 386BSD 0.5 until Bill went back on his earlier approval of the patchkit.
NetBSD was sprung from 386BSD after a bunch of developers expressed frustration with the slow pace of 386BSD. The first NetBSD release was a combination of Net/2 and 386BSD as a port to the Macintosh.
And OpenBSD forked from NetBSD because of disagreements between Theo and pretty much everyone else. Theo is still disagreeing with everyone else. He's famous for it.
I think that pretty much proves my point that *BSD aren't exempt from infighting.
Net evolved from 4.4bsd.

It evolved from Net/2 and 386BSD. If you'd bothered to read the netbsd.org site you'd have learnt this too.
You're a moron.

You're ignorant.

Re:This Explains a LOT by DarkMan · 2001-10-30 05:50 · Score: 2, Insightful

Um, not meaning to tell you how to do your job, but if your admining Linux machines with new kernels, and _not_ following the lists, or Kernel Traffic at least, then I don't think you can really complain about not hearing about it. It's been well known scince 2.4.[012] that there was a VM issue at heavy load.

2.4.13 by josepha48 · 2001-10-30 05:51 · Score: 2

2.4.12 has bugs that prevent certain compile options. 2.4.13 is out and it works really sweet.

Oh an djust becuase there are two different vm's (Alan vs Linus) does not mean that there is an official fork. As well there is already companies like Redhat and Suse that release their own patches to the linux kernel that make them unpatchabe against the main tree.

--

Only 'flamers' flame!

What about the AIX VM? by Sara+Chan · 2001-10-30 06:16 · Score: 5, Interesting

The discussion so far has focussed mainly on Rik's and Andrea's VMs. For the 2.4.x series, that's fair. For 2.5, though, what about considering the AIX VM?

IBM has said that they will open source any part of AIX that we would like. The AIX VM works well under high stress. Obviously it could not just be put as-is into Linux, but there must be a lot of good ideas/algorithms in it that could--arguably should--be moved to Linux. Why isn't anyone looking at doing this?

Re:What about the AIX VM? by Anonymous Coward · 2001-10-30 07:28 · Score: 2, Informative

The AIX VM is bizarre and different - it is almost entirely unlike any other UNIX VM out there. It is hideously ugly, MP support was added as an afterthought and looks like it was written by a pascal programmer.

It also relies heavily on a segment registers based architecture i.e Power/PowerPC (each segment describing 256MB chunks of virtual address space). You start getting into lots of fun when hitting/crossing segment boundaries.

I have some doubts how well this maps to/performs on non-segment based architectures. IBMs inability/unwillingness to put an AIX product out on the Itanium after some heavy investment *may* be related.

Linux Kernel list link by commanderfoxtrot · 2001-10-30 06:18 · Score: 2, Informative

This is a link to the kernel-traffic discussion with details and basic benchmarks: here!.

--
http://blog.grcm.net/

Re:Make it a build option by jovlinger · 2001-10-30 06:24 · Score: 2

ah!

so if I read you correctly, the post-not-really-a-fork-but-almost -ac kernels (2.4.5?) have gotten progressively better at dealing with high memory commitments, and are now back into the regions of usability? (where usability is somewhat arbitrarily defined as being able to have working sets that are very close to physical RAM without having to wait several minutes for top to fire up to tell me which process is the memory hog (pan, if you must know)).

Core design issues. by bored · 2001-10-30 06:42 · Score: 2, Interesting

Part of the problem with the design and redesign of the linux VM is an insistance with sticking with a few core design points that make it 100x harder to write. For instance, virtual memory overcommit spawns a whole bunch of ugly problems that must be solved in order to create a stable and fast system. If the core development team spent some time looking at past OS research then they would completly change their design criteria and a bunch of these problems would go away.

Another perfect example is the OOM killer. If the VMM could properly balance the workload (and it didn't overcommit) then there wouldn't be a need for code to select the 'correct' process to kill. Since the VM cannot balance correctly, the kernel developers spend massive amounts of time trying to write an OOM that functions correctly in the case where the VMM is wedged. This time would be better spent fixing the VMM so it never got into these states.

Slightly OT question by mindstrm · 2001-10-30 08:16 · Score: 2

They mention LIDS (Linux Intrustion Detection System).

My question is..
does the LIDS actually do *any* intrustion detection, or does it just prevent modification of certain files?

And what about BSD? by Ian+Bicking · 2001-10-30 10:32 · Score: 2

Moreso than AIX, people have been reading the BSD VM code for a long time. It seems to be regarded very highly, and its design has been stable for quite some time.

So why doesn't Linux just copy BSD?

The code here seems rather incidental, it's the design that is more important. But why not copy a good design? Or do one (or both) of the contending VMs do so?

Re:And what about BSD? by puetzk · 2001-10-30 11:17 · Score: 2

Rik van Reil's (-ac series kernels) is somewhat modeled off the FreeBSD VM. How much so I don't know.

--
The Matrix is going down for reboot now! Stopping reality: OK. The system is halted.
Re:And what about BSD? by SurfsUp · 2001-10-30 14:11 · Score: 2

Rik van Reil's (-ac series kernels) is somewhat modeled off the FreeBSD VM. How much so I don't know.

Loosely

--
Life's a bitch but somebody's gotta do it.

Re:OSS Power by SurfsUp · 2001-10-30 15:04 · Score: 2

So what happens? The kernel just paints itself into a corner until the machine freezes. Only way to recover is to power cycle.

No, there's a task killing option available under the SysRq key now. And you're wrong, the oom_kill is still in there, it hasn't been removed.

--
Life's a bitch but somebody's gotta do it.

Answers to the above by Animats · 2001-10-30 20:40 · Score: 3, Interesting

False. Any decent VM does demand paging. Only the pages that are needed are loaded from the executable.

If you implement a VM that way, launching a program takes a very long time. You could, in theory, start out with nothing in memory and page-fault the program in. This requires one disk access per active memory page until enough is loaded for the program to run. The very first virtual memory system, for the Burroughs 5500, worked that way. It worked OK for batch programs, in an era when batch programs ran for minutes or hours, but was terrible for interactive work.

Most operating systems today load most or all of a program at startup, let the app run for a while, then release the unreferenced pages. Deciding how much to load at startup is an interesting question. The BSD UNIX guess was the first N bytes of the executable, where N is a system tuning parameter. (What, exactly, does Linux do about this?) This is a mediocre guess, but an easy one to make. It's OK for long-running programs, but terrible for short-lived ones. Short-lived programs don't run long enough for the least-recently-used page info to become useful. If paging occurs in this situation, the pages removed are ill-chosen, since the LRU info isn't useful until the program has run for a while.

Much of the memory-demanding things servers do look like short-lived programs. CGI programs and Java servlets are short-lived programs. So they're a bad case for a VM environment. If memory gets tight enough that short-lived programs get paged out, thrashing is almost inevitable.

You don't want to page out at all on a server, except (maybe) under transient overload. As soon as paging activity starts, it's time to throttle back the amount of server concurrency until paging stops. This requires coordination between OS and application of a kind not usually seen in the UNIX world, though mainframe transaction systems have had it for decades, all the way back to CICS.

Desktop systems have a different set of issues, but they don't look like classic time-sharing systems either. My main point here is that in the last decade, the memory usage behavior for most programs has changed considerably, but we're still using virtual memory concepts that were developed in the 1960 and mature by 1980.

And remember, even when everything works right, you get the effect of at best 2X the memory.

Here's a basic tutorial on VM, with emphasis on Linux.

Re:Answers to the above by Salamander · 2001-10-31 01:46 · Score: 2

If you implement a VM that way, launching a program takes a very long time.

And it's better to load the whole thing? Um...hello? Anybody in there? Sure, you can save some latency by transferring the data in large chunks instead of individual pages, but you make up for that in wasted transfer time. Then you have to consider the effect of filling up the cache with pages you actually turn out not to need, evicting other pages you do need to make room for them. Overall, it's a big loss.

Most operating systems today load most or all of a program at startup

Bullshit.

Much of the memory-demanding things servers do look like short-lived programs.

Bullshit again. Maybe it's true for web servers, though with mod_perl and such out there I have my doubts. It's definitely not true for most other common/interesting workloads.

the memory usage behavior for most programs has changed considerably, but we're still using virtual memory concepts that were developed in the 1960 and mature by 1980.

Three strikes, you're out. I was there in 1980, in fact I was working on one of the earliest SMP VM systems (UMAX, at Encore). Things have changed a lot since then.

And remember, even when everything works right, you get the effect of at best 2X the memory.

Hm. Four strikes? The 2X figure is a Linux 2.2-specific red herring. Many systems can make effective use of more under many workloads.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Answers to the above by Animats · 2001-10-31 05:43 · Score: 2

Apparently Linux doesn't preload applications, though BSD does. See this little note about a Linux preloading experiment, which reduced the launch time for Netscape Communicator from 14 to 4 seconds by reading the whole executable in at startup. On the desktop, that's probably a win when the user launches a program, since it gives the most resources to the program the user wants right now.
But that's a brute-force approach. A better solution would be to profile applications, feed the profile info into the linker, have it put the most-used stuff first, and put a hint in the executable that indicates how much needs to be preloaded to get through the first few seconds of running without a page fault. That would lead big improvements in application launch times.
The Wind River DIAB System does things like that. So does Sun Workshop. But I don't see the Linux world doing anything like that yet.
I see the point about mod_perl. mod_perl is essentially a way to move some loading and memory management from the OS to the web server. If the OS were good enough at transaction processing and program initial load, the need for mod_perl would be much less. But mod_perl is a good tradeoff, because most CGI scripts are tiny. The poor program load performance inherent in a load-by-page-fault approach may have driven the server world into that solution. mod_perl, after all, was developed as a reaction to slow CGI program load times.
Re:Answers to the above by DaveWood · 2001-10-31 08:37 · Score: 2

"A better solution would be to profile applications, feed the profile info into the linker, have it put the most-used stuff first, and put a hint in the executable that indicates how much needs to be preloaded to get through the first few seconds of running without a page fault..."

I was just thinking the same thing. This is an interesting idea. I wonder how difficult a patch would be...

It seems, still, though, to come down to better tuning the system we have (and helping it tune itself!)... if you are proposing a fundamentally different alternative I still do not understand it.

--
We're on the road to Tycho.

Doh. by TheLink · 2001-10-31 15:00 · Score: 2

Why don't you just boot different kernels if you want to try stuff out?

Especially if you are just testing the VMs out, the other stuff shouldn't affect your tests much.

--

Too many replies beneath your current threshold

Why I need it. by TheLink · 2001-10-31 16:05 · Score: 2

Because Linux and some other O/Ses don't handle out of memory situations well.

To me it's better to have a gradual degradation of performance as memory runs out, than to run straight into a brickwall, wheels spinning at 133MHz and have Linux kill the wrong process or deadlock.

Another thing about your proposal: if all of the app has to be in then how do you propose to handle process forking?

Cheerio,
Link.

--

Too many replies beneath your current threshold

Ahh! Vindication. by mindstrm · 2001-11-03 16:46 · Score: 2

Good. I didn't realize it could send mail when these things are attempted. That makes it okay in my books then.

I just thought it was funny to always hear about this 'intrustion detection system' that didn't actually detect anything.

Slashdot Mirror

Debate on Linux Virtual Memory Handling

124 of 330 comments (clear)