Torvalds Has Harsh Words For FreeBSD Devs
An anonymous reader writes "In a relatively technical discussion about the merits of Copy On Write (COW) versus a very new Linux kernel system call named vmsplice(), Linux creator Linus Torvalds had some harsh words for Mach and FreeBSD developers that utilize COW: 'I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games, and bigger caches will only continue to drive that point home.' The discussion goes on to explain how the new vmsplice() avoids this extra overhead."
Do I have that right?
If so, I'm not really seeing his issue. Or at least not as hard-line as he sees it. The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?
I'm just not sure that Torvalds is really looking at all sides of this. He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time.
Links:
Copy on Write as explained by Wikipedia
FreeBSD page on Zero Copy Patches
Duke Uni Research
Javascript + Nintendo DSi = DSiCade
Methinks we need to start tagging "tantrum" to this type of thing.
kernels, me thinks it's just sour grapes because Linus can't compete in that area.
I think Linus has gotten to the point where he just really enjoys trolling. Like, this was OBVIOUSLY uncalled-for, and he's usually such a laid-back guy. Maybe's he's read too much Slashdot. I don't know.
+++ATH0
.. this will help you keep yourself calm.
It's been a while since we had a huge linux vs BSD flame feast.
I'll start.
BSD user: Linux is a confusing mess of programs and is less stable than BSD.
Linux user: Your still here? I thought you were dead by now?
If you wanna get rich, you know that payback is a bitch
And in other news...
Grass is green;
Oil is overpriced;
Absolute power corrupts absolutely.
No he is simply getting less tolerant of "sloppy" programming. He is one of the very very few that believes in doing it the way that gives you the best speed. Something that takes 4+ operations compared to a way of doing it with only 2 operations and you get less problems = performance gains that add up. Just because your typical machine has 4 dual core 8Ghz processors and 22 terabytes of ram does not mean you can slack off and write the whole thing without paying attention to performance.
the BSD guys have their reasoning and if you read more info about this it is not a shot in the dark that Linus is taking but he is frustrated that after many discussions nobody cares as much as he does on the performance issues.
Go back and read what Linus did back in the early days, it's no different today than what it was in 1990, he will call a duck a duck.
Do not look at laser with remaining good eye.
``And in what universe is anyone who can intelligently speak about (much less code around) memory and VM management [be called] an "incompetent idiot"?''
The Universe In Which Spock Has A Beard?
-- Terry
No he is simply getting less tolerant of "sloppy" programming.
You'll forgive me for taking that with a grain of salt so long as memory over-commit remains the default mode of operation within Linux.
Rod Taylor
The complaint is not about general copy-on-write, it's about BSD's ZERO_COPY_SOCKET feature vs. vmsplice().
Basic explanation: Suppose that a program is doing a lot of output to a file or socket. The program can generate data faster
than the kernel can consume it, say. So what should the kernel do with the buffer it receives from the user on each write()?
There are three options.
1) Copy its content immediately elsewhere, so that on return to User Mode, the buffer remains writable and writes are safe.
2) Change the access rights of the page containing the buffer, so that no copy need be made unless User Mode attempts
to modify its content before the kernel has completed the write(). If the user attempts to write, it either gets
permission to do so (because the kernel is done) or it gets a writable copy.
3) Let User Mode promise to not modify the buffer's content until told that it's safe to do so, leaving it writable in
the meantime.
The default behavior is (1); BSD's zero copy socket feature is (2), and the point of Torvalds' complaint; vmsplice() is (3).
"Skill shows through where genius wears thin." -Wittgenstein || Religion: uniting aviation and architecture.
Andy went out and said that he thought the Linux approach was wrong, and archaic, and that people should go and wait for GNU.
Linus said that he felt this was wrong, and that being a prof is no excuse for Minix being the mess it was (and Minix was a mess in the late 1980s/early 1990s). He also apologized if he came off as too harsh for his writing about how people should be able to throw away an old design in favour of a new one anyway, etc.
It was very polite compared to some of the non-Andy/Linux replies.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
There seem to be a LOT of misconceptions about the discussion of vmslice() vs COW vs copy. This has nothing to do with conserving memory and everything to do with high performance I/O. If your app just needs to send a couple small files from A to B, you probably don't care about this at all.
A little background is needed on the terminology and mechanisms of I/O for any of this to make sense. For an example, let's say your app is a very busy web server sending dynamic (but trivial to compute) pages out.
The oldest and simplest method is copy. The app calls write(int sock, char *buffer, int length) on a socket. The kernel coppies the contents of buffer from userspace memory into a kernel space buffer and at least queues the data to the TCP stack before returning.
COW is an attempt to avoid the cost of copying the outgoing data.. In that case, the reference count on the physical pages that make up buffer is bumped up (since now kernel and application are both interested in them), and marks the pages as COW. That is, the virtual memory addresses are set as read only and a flag bit is set (more or less). The latter is done so the kernel needn't worry about them again. By the time the write call returns, the app is able to immediatly write to that memory (sorta) without worry.
When that write happens, the app takes a page fault (writing to a read-only page). The kernel sees that the pages are COW, copies the data to a new physical page, and maps the page in read/write. Then it returns from the fault. OTOH, if the kernel finished with the page first (the data goes on to the wire), it re-marks the page(s) so the app can access them without a copy.
The hope is that often enough, the app WON'T try to write to the pages while they're busy and so the cost of that copy is saved. If that hope comes through often enough it MIGHT be vaguely uesful. I say MIGHT since there is a significant cost just for marking the pages (the CPU's TLB must be flushed for the change to take effect). If the faults happen, it's a BIG loss since handling a fault takes thousands of CPU cycles.
So, for it to have any chance to help, the application programmer must already know enough to TRY to avoid writing to the same buffer again until it gets to the wire. Unfortunatly, it can never be sure so most apps don't bother.
The vmsplice() proposal is fairly simple. In this case, the app explicitly requests special treatment of the write. The pages are NOT marked as read only at all. Instead, the app is on it's honor to leave them alone until the kernel notifies it that they are again available. This saves the copy and the costs of TLB flush AND the (potential) cost of page faults. If the app breaks it's promise, it is the only one to suffer as the data it sent is corrupted (no kernel housekeeping is ever stored in such pages so there are no security implications). Any damage the app might do by sending screwy data could also be done using the old copy method.
What it all comes down to is that playing tricks with page mapping LOOKS nice at first glance since it SEEMS reasonable that not copying bytes around will save CPU cycles and memory bandwidth. The re-mapping (or just permission changes) on pages SEEMS lightweight. Unfortunatly, in fact, re-mapping or changing permission forces cache invalidations and page faults are just plain expensive. With the direction CPU design is going, these things will likely get more expensive rather than less (as they have for most of the history of microprocessor design).
It's really not that complex for an application to use. At least in comparison to the complexities and level of knowledge required to write an app that performs well enough to need this in the first place.