surriel.com · Domains · Slashdot Mirror

Preliminary split-lru patches might help by Sits · 2008-10-06 05:21 · Score: 1 · on How Big Should My Swap Partition Be?

Hmm. That's a real problem alright (but probably only noticeable in longer running but partially idle workloads). There are patches floating around that might help to solve in the form of Rik van Riel's split LRU patches. Note tat patch is not yet in mainstream...

Re:Request by bigberk · 2007-01-26 11:06 · Score: 5, Informative · on 25 Percent of All Computers in a Botnet?

One interesting method is to query an anti-spam database using your IP address, and see if you are listed as a spam source. Quick checks can be done at robtex or dnsstuff.

If your IP address shows up on PSBL, CBL, SpamCop, or WPBL your host is probably infected and a source of spam or other abuse.

Re:SORBS!!! I'd like to ABsorb the so-and-so's!!! by Brightest+Light · 2007-01-05 05:55 · Score: 1 · on SORBS - Is There a Better Spam Blacklist?

I have threatened SORBS with legal action.

Well, there's your problem right there! Most people don't really like legal threats, and amongst the more fanatical anti-spammers, they're quite the source of amusment. I submit for your consideration the cart00ney.org blacklist, which is an RBL specifically for listing people that send legal threats to blacklist operators. I also suggest that you search Google Groups' archive of NANAE for 'Matthew Sullivan' and 'cart00ney', because I'm sure your threat got a good laugh out of everybody there. I'm sure that was your last resort after trying to do all the things a civil and reasonable person would and failing to see any results, but it was definitely not the wisest thing to do.

Re:Make the "People Who Sued Us" list by Kadmos · 2006-10-11 20:23 · Score: 2, Informative · on ICANN Grants Temporary Reprieve to Spamhaus

Indeed one could make a list like that and it would be nice. They could call it http://cart00ney.org/. Other people could then take the list and make it into a DNSBL/RHSBL list and host it at http://cart00ney.surriel.com/

Re:DBAN. Learn it, Live it, Love it. by AKAImBatman · 2005-09-15 03:28 · Score: 1 · on Data Still Left on Storage Devices for Sale

Go learn the difference between blocks and sectors before you comment.

No, I'm well aware of the difference. I was having one of my temporary memory lapses and couldn't remember the term block. Since I was in a hurry, I used the word "Sector" and hoped no one would notice. Ah well. :-)

Blocks default to 4096 bytes, because this is convenient for the page cache;

That's what I said.

nothing stops you using a different size.

Bzzt. You need to use multiples of 512, otherwise the blocks and sectors won't line up properly.

For example, I have ext3 filesystems at work using 512 byte blocks (so the allocation unit is 512 bytes) on a system with 4096 byte pages.

And? I did say that 4K was normally used because it lines up nicely with the page sizes. If you use a different size for blocks, it will still run through the paging system, regardless.

ReiserFS tail-packing uses left-over space in blocks, and space that cannot be used as blocks.

What is "space that cannot be used as blocks"? Blocks are managed by the Linux Kernel. You can't muck with the block size you chose. (Though there has been some mutterings about making the last few blocks of an odd sized device accessable in Linux as a partial block.)

4K blocks is less efficient, but still works fine (which would not be true if the OS paged the memory to disk for writes, since the block would be accompanied by 12K of garbage.

You forget about read-ahead caching. For sequential I/O, the reading is run through the paging system to make read ahead more efficient. So the OS is *designed* to read (and potentially write depending on your kernel version) more than it needs. So it fills pages as necessary. Pages used for disk I/O are not the same pages used for Swap I/O, as that would create something of a mess.

Since the page is filled with the complete data from that portion of the disk drive, it can page out the correct data to disk. i.e. No 12K of garbage as you propose.

I *don't* know if the page sizes used between the file and swap systems are required to match up.

[info]
[more info]
[more, but older, info]

Reads and writes at the OS level can still be done on a single-sector basis; it's just inefficient, as each sector ends up filling a page in the cache, either with the other 7 sectors needed to make up one page, or with dummy data.

I'm not aware of any APIs that allow you to address a block device in a unit smaller than a block, but it's always possible that such an API has been added to recent kernels. I sincerely doubt you'd want to use direct sector addressing, though, since it would probably screw up the OS's attempts at block level locking.

Re:Cashing in on ... by Rik+van+Riel · 2005-04-28 00:14 · Score: 2, Insightful · on Gates Calls for Increase in Tech Labor Supply

You're going to end up competing with workers from all over the world no matter what. You can't wish away free market economics just because they're inconvenient.

The only question is, do you want to compete with foreign workers inside the US, or would you prefer to compete with them in India? Surely competing with them inside the US should be a lot easier, since this is your home country...

Re:spamtraps... by Rik+van+Riel · 2004-11-21 09:56 · Score: 1 · on Tech Reporter Pursues Spammer

Most of the spamtrap domains (for PSBL, at least) do have SPF records. However, they get ignored by a lot of Challenge/Response Authentication Protocol (hey, that spells CRAP - coincidence?) software...

Whenever a false positive is pointed out to me, I add a regular expression to the software to make sure that challenge/response software, mailing list manager or MTA bounce type will not result in future listings. It doesn't help that many MTAs appear to be sending out bounces that aren't RFC compliant.

Note that I cannot control what other Spamikaze lists do - but they do tend to get most of my regular expressions whenever they update from CVS ;)

Re:spamtraps... by Rik+van+Riel · 2004-11-21 09:50 · Score: 1 · on Tech Reporter Pursues Spammer

Note that while Spamikaze is still pretty early in its development (we've got some fancy ideas on how to make it really fly), PSBL already seems reasonably popular.

I hope that means Spamikaze is going in the right direction... ;)

spamtraps... by mmThe1 · 2004-11-20 20:15 · Score: 4, Informative · on Tech Reporter Pursues Spammer

An relevant note here would be to mention Spamikaze system (intro here).

In a nutshell, it sets up spamtrap e-mail addresses, and any IP that sends mail to that address is automatically added to the blacklist, and further mails from it are rejected at SMTP level. A false positive can be easily removed from the blacklist manually (example, PSBL).

Re:Messaging layer by CraigParticle · 2003-07-16 13:01 · Score: 1 · on DragonFly BSD Announced

In fact Dillion help fixed the vm bug in Linux 2.4.

Huh? FreeBSD's current VM (which Dillon played a very significant role in developing) was indeed the inspiration for the (Rik van Riel) VM in 2.4.0 through 2.4.9. But this VM was largely replaced in the 2.4 mainline kernels starting with 2.4.10. The replacement was principally a balance between Rik's multiqueue approach and Andrea Arcangeli's classzone-based patches. AFAIK, this "fix" had nothing to do with Dillon, besides it was a departure from the existing FreeBSD approach.

Rik's VM has continued to develop -- it was maintained in Alan Cox's -ac tree for a long time and today it is very successful in the form of Rik's reverse mapping patches, a feature that (in a somewhat different form) is also in FreeBSD.

The minimal reverse mapping feature itself has been integrated into kernel 2.5/2.6-test, though the page replacement algorithms in 2.5/2.6 are much closer to the original Arcangeli classzone approach than Rik's. We'll see if new page replacement schemes, and object-based rmap (akin to FreeBSD) show up in kernel 2.7.

Point being, I know Dillon has admitted interest in Linux VM development, but I'm unaware of any direct involvement (i.e. patches).

Programming and Dvorak by Kourino · 2003-02-15 14:16 · Score: 1 · on Keyboard Layouts for the 21st Century?

Actually, yes. I've been using Dvorak for about a year after having a couple random RSI flareups, and did a bit of kernel hacking for school last semester. Not to mention other classes and my own programming projects. I got used to it. My braces/brackets keys are the two to the right of zero. I like the underscore placement (on the key marked: ' " on qwerty keyboards) since I use lots of underscores in variable names.

Random fact: Rik van Riel uses Dvorak.

Re:Wait for it... by forming · 2002-10-07 17:56 · Score: 0 · on LFS 4.0 Released

User Mode Linux is not the "new VM". Rmap is the new VM, which stands for "Virtual Memory" by the way. User Mode Linux is something very cool though, and yes, you don't have to wait until 2.6/3.0 comes out to try it. Look here or here.

Linux 2.4.x VM by Trevelyan · 2002-03-12 09:20 · Score: 3, Insightful · on Swap Performance in Linux

Did you miss all the 2.4 Linux VM Stories?

I suggest build/installing the latest kernel with the aa VM (the default VM, since 2.4.10). If you still have VM (Swap) problems then go get the latest rmap VM patch and try that.

The kernel VM (Virtual Machine) is what manages memory and sawp, btw.

And if u did miss all the VM stories, a summery:
at the start of 2.4 a new fancy mv was put in to action, using something known as reverse mapping. this was very clever but it wasn't quite ready and there were teathing troubles then suddenly (2.4.10) Linus switched VM to one similar to that of 2.3 (with some updates and a few features from the previous 2.4 VM) This started a big fight, which caused concerns (such that it may split the linux comunity)

which is better i dont know some swer by one other swer the other. but unless ur using RH 2.4.9 kernel i would not recommend a pre 2.4.10 kernel.

however you may need to experiment which is best the VM now in 2.4 (to stay) or rmap, u should try both and see

steps
Install 2.4.[17,18,19]
try it
if it fails u try the rmap patch

Re:If you support forks so much... by Rik+van+Riel · 2002-02-04 10:30 · Score: 5, Informative · on Byte Benchmarks Various Linux Trees

Nice troll ... ;)

My -rmap VM is a patch against marcelo's standard 2.4 kernel, because that is the thing people have. It just doesn't make sense to release patches against kernels nobody has.

Also note that -rmap replaces pretty much all parts from the -aa VM I don't agree with, while at the same time integrating some parts from the -aa VM that I do like.

Re:Riels rmap is nice...... by Rik+van+Riel · 2002-02-04 10:09 · Score: 5, Insightful · on Byte Benchmarks Various Linux Trees

I am a little leary about using the rmap in prouction as of yet, it seems to be killing things each nigh, (no shit) that dont drop with 2.4.17 or 2.4.9

Interesting. I've not managed to run into bugs like that on my computers here, so you must be running a very different workload to trigger such a bug.

Would you have the time to help me debug this problem and is it still happening with the latest rmap VM ?

Re:Interesting conclusion... by Rik+van+Riel · 2002-02-04 10:00 · Score: 5, Interesting · on Byte Benchmarks Various Linux Trees

What a nice article and it seems to come at a time just then everyone is talking about "Fork this and Fork that" that in fact this is exactly what is needed in this healthy debate.

Indeed, forks are (IMHO) the best way of doing development. Doing your development in the main kernel will just lead to contradictory code being integrated and the code never working quite right because it's missing fixes (guess why RH's 2.4.9 runs faster ... it does have the fixes).

One minor nitpick though ... I never released an -rmap VM against 2.4.18-pre3, the latest is still against 2.4.17. I suspect that the crashes Moshe saw are due to some change in 2.4.18-pre3 conflicting with the -rmap VM patch, especially since rmap-11c has survived the kernel torture lab at RH. ;)

Nice release by daserver · 2002-01-30 12:49 · Score: 3, Interesting · on Kernel 2.5.3 Released

This is a very nice release. As you can see from the changelog the new ide drivers are finally in 2.5.x. Lets hope this will give Marcelo one more reason to include them in 2.4.x.
The O(1) Scheduler from Ingo is also in here (version J9) at the moment.

All of these patches are also available for 2.4.x! Im running aa WM, scheduler O(1) and the new ide patches right now and have been for more than a wekk without any problems whatsoever. Also for those of you that want to try riks VM there's also a patch for that.
Anyway those patches are only for those of you adventoures like me :-). But it has been said that Rik's VM brings the VM back to the -ac13 state.

The real problem is... by rsd · 2002-01-17 00:11 · Score: 4, Insightful · on 2.4, The Kernel of Pain

IMHO, the real problem is the stock kernel.

It is commom to see that the stock kernel has lots
of missing patchs to increase stability and as pointed out by
Rik van Riel which was posted
here in slashdot, Linus rejects
random patchs which cause some areas of the kernel to not be "as good as it should".

The VM is one part which Linus just got random
patchs from Riel and rejected some of them randomically which made the VM suck so hard in
earlier stock 2.4 kernels.

OTOH, kernels shiped from distributions includes
(at least it should) the missing parts and should
be better than the stock kernel from kernel.org .

I don't use Mandrake to tell how good their
kernel is or is not. But I use
Conectiva Linux and I know how good their kernel package is.
Their kernel includes missing fixes that do not get over the stock kernel.
Better of all, their kernel maintainer is

Marcelo Tosati
who maintains the stable kernel tree now.

I think that we will see an improvement into new
2.4 releases.

The latest 2.4.17 kernels from Conectiva can be found in here .

Re:Minor nit... by Rik+van+Riel · 2002-01-15 03:41 · Score: 5, Informative · on Rik van Riel on Kernels, VMs, and Linux

Both Alan's and Michael's kernels are including my -rmap VM now.

This is quite interesting since I haven't begun tuning -rmap for speed yet ;)

Answers to the above by Animats · 2001-10-30 20:40 · Score: 3, Interesting · on Debate on Linux Virtual Memory Handling

False. Any decent VM does demand paging. Only the pages that are needed are loaded from the executable.

If you implement a VM that way, launching a program takes a very long time. You could, in theory, start out with nothing in memory and page-fault the program in. This requires one disk access per active memory page until enough is loaded for the program to run. The very first virtual memory system, for the Burroughs 5500, worked that way. It worked OK for batch programs, in an era when batch programs ran for minutes or hours, but was terrible for interactive work.

Most operating systems today load most or all of a program at startup, let the app run for a while, then release the unreferenced pages. Deciding how much to load at startup is an interesting question. The BSD UNIX guess was the first N bytes of the executable, where N is a system tuning parameter. (What, exactly, does Linux do about this?) This is a mediocre guess, but an easy one to make. It's OK for long-running programs, but terrible for short-lived ones. Short-lived programs don't run long enough for the least-recently-used page info to become useful. If paging occurs in this situation, the pages removed are ill-chosen, since the LRU info isn't useful until the program has run for a while.

Much of the memory-demanding things servers do look like short-lived programs. CGI programs and Java servlets are short-lived programs. So they're a bad case for a VM environment. If memory gets tight enough that short-lived programs get paged out, thrashing is almost inevitable.

You don't want to page out at all on a server, except (maybe) under transient overload. As soon as paging activity starts, it's time to throttle back the amount of server concurrency until paging stops. This requires coordination between OS and application of a kind not usually seen in the UNIX world, though mainframe transaction systems have had it for decades, all the way back to CICS.

Desktop systems have a different set of issues, but they don't look like classic time-sharing systems either. My main point here is that in the last decade, the memory usage behavior for most programs has changed considerably, but we're still using virtual memory concepts that were developed in the 1960 and mature by 1980.

And remember, even when everything works right, you get the effect of at best 2X the memory.

Here's a basic tutorial on VM, with emphasis on Linux.

Below is the article copied from Byte... by Eusebo · 2001-10-30 03:35 · Score: -1, Flamebait · on Debate on Linux Virtual Memory Handling

Byte.com is pretty non-responsive from my part of the world. Its a good read if you have time...

Linux Kernel Pillow Talk

(Linux Kernel Pillow Talk: Page1of1)
By Moshe Bar

October 29, 2001

And you thought the netherworlds of dry kernel engineering were free of politics, egos, and prima-donnas? Guess again. The events of the last four to six weeks and the e-mails flying to and from the Linux kernel mailing list show how Byzantine and complex the dynamics of decision finding, features design, and implementations can be. Go to http://www.tux.org/lkml/ to subscribe to the kernel mailing list, but be careful: This is a very high-traffic list. Subscribe only if you really want to follow every single detail of the Linux kernel, or instead read the weekly digest at Linux Kernel Cousin at http://kt.zork.net/kernel-tra ffic.

Sure, the lively debates have always existed. In the past there have been disputes about the Linux firewalling code, networking code, scheduler, installer, driver model, and many more. One recurrent theme has always been the Virtual Memory (VM) manager. Nothing determines the peculiar behavior, the feel -- even the ultimate success or failure of an operating system -- like its virtual memory design. Sometime during the development cycle leading up to the Linux 2.4.0 kernel, in other words in 2.3.xx times, Rik Van Riel (http://www.surriel.com), a Dutch kernel hacker working for Brazil-based Conectiva (one of the smaller Linux distributions), introduced a radically new VM code. It was based on what seemed to be new and advanced algorithms for efficient finding, allocation, and disposal of virtual memory pages requested by programs. Rik later introduced an interesting new kernel feature called the "OOM killer." OOM stands for Out Of Memory. The OOM killer attempts to locate a killable process when memory runs out in the system. Without such a feature the whole machine can go nuts or enter a vicious cycle of swapping out a few pages, realizing immediately after that those pages are needed, and searching again for swappable page candidates, keeping the kernel busy doing only this instead of letting user processes run.

Rik is a gifted hacker, and among other things he has been trying to improve the efficiency and speed of maintenance of those lists in the kernel responsible for managing all the virtual memory pages in the system. One of the main questions to address in every operating system VM code is: "How do you choose which page to steal next when there is a RAM shortage?"

In the 2.4.0 release, the Linux kernel scans the process page and decides which page to remove. The problem with this approach is that sometimes a lot of process tables have to be scanned to free just one page, or very few pages. Also, this approach does not guarantee that the pages stolen are only those that will not be needed again very soon. Some UNIXes introduced the notion of the working set; that is, the minimum amount required by a process to function efficiently. This solution is, however, limited to per-process pages only and does not consider other kinds of pages, such as filesystem caching. Stealing from these pages might in some cases even prove counter-productive. Very often in VM theory, a solution to one problem can worsen another; that's why kernel programming is difficult.

Rik van Riel and I have variously discussed another approach, called "reverse mapping," which implements a reverse-lookup between the page and process table. Once you have reverse-mapped pages, the VM can simply scan the pages for the ones to be freed. Naturally, some extra fields need to be added to the appropriate control tables to allow this reverse mapping. My own implementation has an overhead of 14 bytes and is therefore certainly a lesser solution than Rik's -- his overhead is just 8 bytes.

Other extremely talented kernel hackers such as Marcelo Tosati and Ben LaHaise have made other important contributions to the Linux VM.

However, even though all these intelligent people tried hard to make the Linux VM fast, efficient, and powerful, user reports since the 2.4.0 release indicated poor Linux kernel performance and erratic and unstable behaviors. Up to kernel 2.4.7, for instance, on machines with small memory footprints (less than 40-MB RAM), sudden swap storms could erupt which would virtually freeze the system while it inexplicably started swapping pages in out and like crazy. In some cases, the aforementioned OOM Killer would choose the wrong process to kill; I have seen the all-important init process killed erroneously. Many fringe kernel projects, like my own Mosix project or others such as Win4Lin, suffered because users accused these projects of unstable operations, assuming that a released kernel like 2.4.0 must be free of such nasty bugs. Even though the kernel had gradually evolved from 2.4.0 to 2.4.9, it was evident that the VM design was more of a liability than an advantage.

Linus himself said in a recent kernel list mailing that he wasn't happy yet with the VM. These problems were enough for many Linux shops to resist the migration to the 2.4 kernels and instead continue using the 2.2.19 kind of kernels. Obviously, compared to 2.4., the 2.2. series has many shortcomings -- like no zero-copy networking, the division of page cache and buffer-cache in filesystem operations, big spinlocks (serializations of kernel execution paths for computers with more than one CPU) for many parts of the kernel, and so on.

A simple C program like the one below shows how kernels up to 2.4.9 had problems dealing with stress workloads on the VM system. If, after running this program, you turned the swap partition off with swapoff, your server or workstation would become totally unresponsive for up to 15 minutes.

/* based on a code originally proposed by Andrew Tanenbaum, later by Derek Glidden and many others since */ #include void main(void) { /* in the next line we allocate 200MB, but since the virtual memory page is not actually allocated by the kernel until we use it, we also have to create an access to. The amount of allocated pages should reflect the total RAM on your computer. This test runs well with machines of, say, 256MB */ void *p = (void *)calloc(50000000, sizeof(int)) ; /* In the next line we let the system calm down a bit after allocating pages*/ sleep(12); /* and now re release it all again */ free(p); }

Back in February 2001, I ran an informal and unscientific benchmark comparing FreeBSD 4.1.1 to kernel 2.4.0 (visit http://ww w.byte.com/documents/s=558/byt20010130s0010/) on exactly the same hardware and with exactly the same subsystems versions (MySQL, Sendmail, Apache, and others). The results clearly showed that, indeed, there were major problems with the efficiency and speed of the early 2.4 kernels. A New VM

Then, on September 24, with the kernel standing at version 2.4.9, everything suddenly changed. Andrea Arcangeli, an Italian kernel hacker (read my interview with him two years ago at http://ww w.byte.com/documents/s=287/byt20000229s0008/) and a very prolific contributor, decided that enough was enough. He sat down and in one of those marathon hacking bouts completely rebuilt the VM from scratch. In short succession he sent to Linus Torvalds over 150 patches to the 2.4.9 kernel, to implement a new VM engine. This is an extremely remarkable feat. A VM is a major piece of software and by nature very complex. One needs to satisfy many opposed objectives: Simultaneously efficient handling for server-type loads and interactive-type loads; ease of implementation and at the same time, optimized use of every last and small feature of the CPU. The VM must also be able to run well on Intel CPUs spanning 4 or 5 generations, as well as on AMD chips, Alphas, MIPSes, Sparcs, ARMs, and what have you. Andrea, by the way, does all his development on a Compaq AlphaServer with 2 500-MHz CPUs and 3-GB RAM.

Out of the blue, Linus accepted the new VM and incorporated it into the official Linux kernel tree.

Recently, I spent two days with Andrea giving speeches. During the two days, over many bottles of beer, we had plenty of time to discuss his new VM. I was mainly interested in how the new VM affects Mosix. Because Mosix must migrate virtual memory pages belonging to the program's address spaces between cluster nodes, it is important to correctly understand the VM and interface efficiently to it.

Specifically, Andrea took exception to the following problems in the 2.4 VM:

kswapd looping forever on DMA or NORMAL class-zones.
swap+ram will be almost all available address space (modulo when the swap cache serves to avoid swapin of shared anonymous memory after a fork).
swapout storms.
benchmarks, when run repeatedly, gradually slow down.

The new VM is much simpler and faster. Let me explain how it works.

The old 2.4 VM had a major design problem that manifested itself mainly when freeing physically dirty pages (remember dirty pages are the frames of 4-KB memory in the RAM whose contents have been modified by one of the virtual memory pages residing in it). The last owner of the page (usually the VM, except in swapoff) has to clear the dirty flag before freeing the page. When being swapped off in swapoff it may be a little more complicated -- we may need to grab the pagecache_lock to ensure nobody starts using the page while we clear it.

So, Andrea went and did the following: All physical pages are now divided into active and inactive pages. These two are further divided into dirty and clean for both active and inactive. When the active dirty pages become about 66 percent of the total number of pages, the VM starts to scan them for the oldest ones to be put into inactive dirty and then, later still, from there to the swap when memory becomes tight. This part is very central to the new VM and its simplicity is...well, simply stunning.

This elegant mechanism totally changes the behavior of the 2.4.10 kernel under heavy load and also makes for much better predictability of the system. Another very important change is that the swap is now additional to the RAM, just like in 2.2 times. All earlier 2.4 kernels (since 2.3.12) needed at least the same amount of RAM in swap and then more to give you additional virtual memory. This meant that on an 8-GB server, you needed to put aside almost a full 9-GB disk just to be able to swap, similar to some versions of Solaris or other UNIXes.

Finally, the page scanner doesn't page scan if there are theoretically no freeable pages, whereas before it did. Oh, and the OOM killer never really worked, so Andrea disabled it, as I did for all my kernels. In 2.4.12 it is enabled again; this time, however, it works much better. Try it with the above program to see it in action.

Arcangeli's VM is stable, acts predictably -- something that the old VM never really achieved -- and it makes the swap space look like it did in 2.2 days. Additionally, the design is much simpler and easier to understand. People will catch up fast with it.

However, many kernel hackers disagree. Upon the release of kernel 2.4.10, a virulent and sometimes aggressive debate flamed up, with many people trying to show why one of the two was a good VM and the other not. Some comments got a bit out of control, and only in the last two weeks or so has some calm been restored.

However, one nasty side effect stays. Alan Cox, the number two man after Linus Torvalds, does not yet like the new VM and in his own kernel tree (called the "ac tree") he still continues to use and patch the old VM. As a consequence, users and system administrators now find themselves facing two very different kernel trees to choose from: the official Linux tree and the Alan Cox tree. Quite often, latest patches to drivers and new features are only in Alan Cox's tree. Those who want to go with the official Linux source code may find themselves unable to apply the patches due to the different VM code all over. It is acceptable for the two trees to be different for a few days on such important subsystems like VM, but it is not acceptable to have them different for months and across many kernel versions.

Nobody has yet dared to speak of a Linux source fork, but this is dangerously close to one.

It became obvious that the VM up to 2.4.10 was a design liability. You can try to fix something that was designed badly, but it will never become a beauty. I think Linus' decision to scrap the old VM and go with the Arcangeli VM was courageous and right. Having a functioning and stable Linux box should not be deferred to 2.5 when we can do it already with 2.4. Kernel Preemption

But apart from the VM issues, there are other lively debates in the kernel community. There was an interesting interview at h ttp://kerneltrap.com/article.php?sid=328&mode=thre ad&order=0 with Robert Love, who is leading one of two projects trying to make the Linux kernel fully preemptible. Making the kernel preemptible means making it possible to interrupt whatever the kernel is doing (say, executing a system call) to process some other outstanding task and then return to its original task. Linux, as a multiprocessing OS, obviously always did that for user-land processes. However, many, just like Robert Love, feel that the fact that Linux up to now would not let itself be interrupted contributed to poor latency. Latency describes how quickly you can expect a response from your kernel when you actually need something from it. Note that Linux is not designed as a real-time OS (though there is at least one Linux real-time implementation somewhere), and therefore does not explicitly guarantee latency. User-land programs must be aware of this as, especially with kernel preemption, latencies can be very unpredictable.

Theoretically, an OS will answer faster if it can be interrupted. What does suffer from kernel preemption is the global throughput. If you have a task that gets n seconds within the kernel to complete (let's say executing a given system call takes 0.005 seconds), then all the interruptions add some overhead to switch from one kernel task to another. So, finishing the execution of that system call (in our example) will finally require n+op where p is the frequency of switching and o the static overhead for one switching operation. Notice that kernel context switching does not invalidate the CPU cache, and is therefore not as expensive as process switching. However, kernel preemption will surely lead to a higher rate of switching from kernel space to user space, because upon preemption the scheduler might decide to give higher priority to a user process.

In other words, kernel preemption does decrease latency but slows down overall throughput. It's the math: nothing to be done against it.

Furthermore, in his interview, Robert Love heavily criticized Linus Torvalds for adopting Andrea Arcangeli's new VM in 2.4.10 and dropping the old van Riel VM.

Well, I did try the patch with kernel 2.4.12 and with pre13. While accurate measurement (which Robert Love provides with the preemption kernel patches) does indeed report an improvement in latency, for the life of me I have not noticed it on an empirical basis.

I really do appreciate Love's work, but I do not fully agree with some of his comments in the interview. First, as Linus himself said, if latency sucks in the kernel then we should check why it sucks, with or without preemptive scheduling. If the latency is bad in the stock kernel, then it should be fixed anyway.

The preemptive kernel 2.4.12 worked fine on my laptops and on my SGI 550 workstation where I do interactive work. The MP3 player very rarely skipped beats when doing heavy background work such as kernel compiling or opening large files in the editor. But for my servers and clusters, the decrease in performance and the unpredictability of latency is a problem. Also, some important patches will not apply to a Love-patched kernel. Mosix, the clustering kernel extension, does not patch correctly, and neither do some versions of the LIDS intrusion detection system.

It is up to each individual user to decide whether or not to use the patch, but is important to understand the implications of using it. Linux and FreeBSD Revisited

Upon returning home the other week after meeting with Andrea, I went to my lab and searched for the disk images of the server comparison I ran back in January of this year (of FreeBSD 4.1.1 versus Linux 2.4.0). I took the Compaq ML500 server I have been reviewing (2x 1-GHz CPUs, 2-GB RAM) and upgraded both the FreeBSD disk image to 4.4-Stable and the Linux version to 2.4.12. Then, I changed the memory down to 192-MB RAM so as to stress the VM system more. I also upgraded to the latest stable versions of Sendmail (8.12.1) and MySQL (version 3.23.42). Finally, I compiled everything with the latest version of gcc, 3.0.2, and tuned the two instances to the best of my knowledge (softupdates and increased maxusers for FreeBSD, and untouched default values for Linux).

The results were very interesting indeed. Since this benchmark is too much to be handled in this article, Byte.com will post it here soon for you to read.

The story of this article is that the 2.4 kernel has finally grown up with the 2.4.10 release. Not many users outside the relatively small kernel community realize that. Now you know about it, too. Spread the good news and immediately install 2.4.12 on your busy server. The server will thank you for it.

Moshe Bar is a systems administrator and OS researcher who started learning UNIX on a PDP-11 with AT&T UNIX Release 6, back in 1981. Moshe has a M.Sc and a Ph.D. in computer science and writes UNIX-related books.

For more of Moshe's columns, visit the Serving With LinuxIndex Page . Page1of1

The current state, you should know by acumen · 2000-10-06 05:58 · Score: 5 · on 2.4 Kernel Delayed, Says Linus

I have been following the development of the 2.4 kernel since test5, which is about 3 months ago.

For starters, a bunch of drivers that worked in 2.2.x are broken currently in 2.4. Those need a fix before 2.4 turns final.

Recently there was a lot of work on the VM (virtual memory subsystem). It's a very smooth VM, reminds you of FreeBSD ;). But it's also a bit buggy at the moment, so it must be fixed before 2.4-final.

With more people testing the 2.4 kernel, with more bug reports, it will be a lot better for the developers to fix 2.4 to perfection, so hurry up and try the new kernel. I recommend trying out test8 or test7, or test9 with Rik van Riel's latest VM patch.

.

Slashdot Mirror

Domain: surriel.com

Comments · 22