Torvalds Has Harsh Words For FreeBSD Devs
An anonymous reader writes "In a relatively technical discussion about the merits of Copy On Write (COW) versus a very new Linux kernel system call named vmsplice(), Linux creator Linus Torvalds had some harsh words for Mach and FreeBSD developers that utilize COW: 'I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games, and bigger caches will only continue to drive that point home.' The discussion goes on to explain how the new vmsplice() avoids this extra overhead."
Do I have that right?
If so, I'm not really seeing his issue. Or at least not as hard-line as he sees it. The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?
I'm just not sure that Torvalds is really looking at all sides of this. He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time.
Links:
Copy on Write as explained by Wikipedia
FreeBSD page on Zero Copy Patches
Duke Uni Research
Javascript + Nintendo DSi = DSiCade
Playing games with VM is bad.
:(
I know, I hate it when I have to listen to 26 hang up messages in my inbox only to find out someone is playing games with me.
He who knows best knows how little he knows. - Thomas Jefferson
Methinks we need to start tagging "tantrum" to this type of thing.
kernels, me thinks it's just sour grapes because Linus can't compete in that area.
Is that my COW?
it goes "incompetent idiots."
It is a Torvalds.
That is not my COW.
As a Slashdot user there is no way in hell you have 26 messages on your phone machine. Maybe 3 messages, but thier probably your mom calling to ask you when your coming out of the basement, your friend inviting you to stand in line for tickets to the latest Sci-Fi flick and the Pizza guy confirming your order of 2 large and a 2 liter of Mt. Dew on a Friday night.
It will be interesting to see what weapon the BSD crowd will retaliate with.
I would just prefer that their response is to release a stable system using their method.
Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
I think Linus has gotten to the point where he just really enjoys trolling. Like, this was OBVIOUSLY uncalled-for, and he's usually such a laid-back guy. Maybe's he's read too much Slashdot. I don't know.
+++ATH0
This is so tech I don't even undersatnd what they are talking about yet I am very "Intellectually Curious".
I like-a do-the cha-cha.
With a comment like that, I can only imagine the kind of temper tantrum that Theo de Raat will throw. I mean honestly, whatever happened to common courtesy? There's no need for such comments like "incompetent idiots". I see so many people push for the advancement of OSS, only to find that it was in vain thanks to school-yard hissy fits like this.
The basic idea is to fake some memory to memory copying operations by using the virtual memory hardware. More specificially, the idea is that when you do a big "write", the space just written becomes read-only to the writing process, rather than being actually copied. When the write is complete, read-only mode is turned off. This eliminates one copy.
The trouble with this is that when you manipulate the page table to do that, you have to do some cache invalidation. That usually results in cache misses, which outweigh the cost of the copy. So this usually is a lose. Linus points out that it looks good on benchmarks, because benchmarks typically aren't using data for anything and thus don't experience the cache misses.
Actually, copying is a relatively cheap operation in modern CPUs unless the copy is huge, since most of the work is done in the caches. The mania for "zero copy" complicates systems considerably, makes them less reliable, and, in the end, usually doesn't speed up real work by much.
Some of this mania comes from Microsoft FUD. At one time, Microsoft was claiming that an "enterprise OS" must be able to serve web pages from inside the kernel. This led to more Linux interest in "zero copy" approaches to be "competitive".
.. this will help you keep yourself calm.
In the spirit of open source community development, he can't make statements like this and expect to be a role model for the open source community.
RMS, ever heard of him?
For the perfect anti-Unix, write an OS that thinks it knows what you're doing better than you do and let it be wrong.
It's been a while since we had a huge linux vs BSD flame feast.
I'll start.
BSD user: Linux is a confusing mess of programs and is less stable than BSD.
Linux user: Your still here? I thought you were dead by now?
If you wanna get rich, you know that payback is a bitch
Are the *BSD people are nicer? Or at least more tactful?
No. Thats why there is more than one BSD. Issues come up, and booom crash goes the fork. Pity.
It will be interesting to see what weapon the BSD crowd will retaliate with.
My guesses are they will respond something like this:
FreeBSD: FreeBSD users will continue their campaign of random acts of elitist snobbery against Linux users.
OpenBSD: Theo will threaten to stop work on OpenSSH unless Linus gives him $10,000 for every nasty email he sends a *BSD developer.
NetBSD: Will stop developing the NetBSD port for Linus' microwave.
0 1 - just my two bits
And in other news...
Grass is green;
Oil is overpriced;
Absolute power corrupts absolutely.
Here we go again, imposing "role model" status. Linus is just a guy. Sometimes he gets his buttons pushed, sometimes he's doing the pushing. BFD. Maybe you'd be a little pissy too if Slashdot posted a story every time you did or said something. Linus Prefers Gas-X, Says Bean-o Is For Douchebags. Who cares? (BTW, Linus didn't really say that, I made it up. Don't wanna get the Bean-o people on his case too.)
As far as this whole VM thing goes, time and testing will tell the true story. Meanwhile, maybe we could try NOT deifying Linus (any more)?
I saw it on Slashdot, it must be true!
Our top story tonight, uber geek Linus Torvalds unleashed a scathing indictment of some other geeks, claiming they are skating on thin ice by using Virtual Memory calls to improve performance. The words sparked outrage in the dark rooms of colleeg geek programmers from Berkley to Berlin. The angry geek mobs said they're going to launch a flame war from their computers "to teach Linus a lesson."
In the words of George Takei "Hoooooooly geeeez!" This is news??
Dictator? Are the FreeBSD developers somehow unable to keep their implementation now that Linus deems it stupid?
You might feel he's being a bit of an arsehole, but that doesn't mean he's a dictator. He's not stopping anybody from doing anything, he's merely sharing his opinion of a development technique on a mailing list dedicated to discussing the development of his kernel.
Bogtha Bogtha Bogtha
"I claim that Mach people (and apparently FreeBSD) are incompetent idiots."
Linus, who's becoming more outspoken as he ages, needs to find that line between anonymous forum geek and software spokesperson...and then not cross it. Calling anyone an incompetent idiot is both non-constructive if you're hoping to improve a situation, and just plain unfriendly in an area where cooperation amongst developers is so crucial (open source).
No he is simply getting less tolerant of "sloppy" programming. He is one of the very very few that believes in doing it the way that gives you the best speed. Something that takes 4+ operations compared to a way of doing it with only 2 operations and you get less problems = performance gains that add up. Just because your typical machine has 4 dual core 8Ghz processors and 22 terabytes of ram does not mean you can slack off and write the whole thing without paying attention to performance.
the BSD guys have their reasoning and if you read more info about this it is not a shot in the dark that Linus is taking but he is frustrated that after many discussions nobody cares as much as he does on the performance issues.
Go back and read what Linus did back in the early days, it's no different today than what it was in 1990, he will call a duck a duck.
Do not look at laser with remaining good eye.
Linus Torvalds: "Don't have a COW, man!"
No, in Longhorn, it's called COW-tipping(tm).
``And in what universe is anyone who can intelligently speak about (much less code around) memory and VM management [be called] an "incompetent idiot"?''
The Universe In Which Spock Has A Beard?
-- Terry
Linus is a gifted engineer ,let him be rude . Aside from Linus being rude , there is no actual story here .
I used to own restourant and also an Office supplies shop . It was quite interesting and made me some money , but I hated the fact that the most important factor in my life was pleasing(customers) or fighting(suppliers) other people . I had to constantly think what to say and how to behave .
I am no longer a business owner , and now I work with a rather gifted bunch of engineers , and frankly it gives me great pleasure to know that neither I nor the people I work with dont really care about being polite , clean shaven well spoken or good looking . I can be rude if I want to , they can be rude if they want to , and we all get along very well .
My Starcraft 2 Blog
Off the top of my head:
Linux to BSD: "Don't have a COW, Man!"
Linus UnMOOved by COW.
Penguin: Demon COW Dog.
Linus, you may be right and you may be very smart, but you should try a little tact. Here's a good definition for it that I learned from a drill sergeant: "Tact is the ability to tell someone to go to hell and look forward to the trip."
Being nice and respectful doesn't mean you can't tell it like it is.
Do what is right and let the consequence follow
In practice I think the FreeBSD approach probably does have speed advantages in most cases, and the fact that it's transparent to the userspace developer would seemingly be a big advantage.
No, it has a speed advantage over read()/write() provided you are aware of exactly how it works. The fact that it's transparent to the userspace is a bad thing because it means you have code written a certain way- that nobody will ever understand why.
Reusing the pages causes the speed benefit to go away- and in fact it'll be slower than read()/write().
This sort of thing matters almost exclusively to people doing really deep performance tuning, and for them it's better to present a simple API with large rewards for tuning, instead of transparently doing something weird to an existing API that will break in the field without you noticing and requires really weird usage to get the best performance.
I agree completely. Unfortunately, the FreeBSD API is inadequate. It's not faster in practice unless you do something really really weird (waste memory). The big difference is the Linux implementation gives explicit notification and the FreeBSD API doesn't.
FreeBSD doesn't provide an API to ask if the pages are still in use. That'd probably make their approach usable- but at that point, why bother updating the page tables at that point?
Once you're there, why bother statpage() to check to see if the page is in use? Why not have the kernel send the pages that are available via a file descriptor so you can poll() or select() on it?
At this point, you're at the Linux implementation.
That's it. That's why it's better.
No he is simply getting less tolerant of "sloppy" programming.
You'll forgive me for taking that with a grain of salt so long as memory over-commit remains the default mode of operation within Linux.
Rod Taylor
Linus has frequently called people idiots, and ignored patches, and done stuff his own way for a very long time now. He's quite successful at it. Perhaps what most people need to realize is that he is that good, that he can. The average read-Slashdot-during-work-while-coding Slashdotter is not in his league, so decrying his adhominem attacks, or "I would do X instead" arguments just dont hold much water.
I want to delete my account but Slashdot doesn't allow it.
You'll scream! I'll vmsplice ya, it's gonna hurt.
For example, aio_write() writes to the file descriptor, allows you to poll for success a la select, and tells you not to modify the buffer before it's done (but doesn't try to stop you with copy-on-write).
This sounds exactly like what Linus wants.
Linus long been called a "Benevolent Dictator for Life". I guess this supports the idea that, with all dictatorships, you get more that what you bargained for.
No data, no cry
The complaint is not about general copy-on-write, it's about BSD's ZERO_COPY_SOCKET feature vs. vmsplice().
Basic explanation: Suppose that a program is doing a lot of output to a file or socket. The program can generate data faster
than the kernel can consume it, say. So what should the kernel do with the buffer it receives from the user on each write()?
There are three options.
1) Copy its content immediately elsewhere, so that on return to User Mode, the buffer remains writable and writes are safe.
2) Change the access rights of the page containing the buffer, so that no copy need be made unless User Mode attempts
to modify its content before the kernel has completed the write(). If the user attempts to write, it either gets
permission to do so (because the kernel is done) or it gets a writable copy.
3) Let User Mode promise to not modify the buffer's content until told that it's safe to do so, leaving it writable in
the meantime.
The default behavior is (1); BSD's zero copy socket feature is (2), and the point of Torvalds' complaint; vmsplice() is (3).
"Skill shows through where genius wears thin." -Wittgenstein || Religion: uniting aviation and architecture.
I can certainly see the value in explicit notification of page usage, but I have to wonder if this isn't attacking the problem at the wrong level. It seems that these problems are caused by the semantics of read() and write() calls, requiring data to be read/written to an arbitrarily aligned userspace buffers.
Zero copy can definitely make things complex, and in the current implementations, the value is arguable. (and being argued...) Still, memory copies have an associated cost. While they may be better than COW with explicit notification, it is still a performance hack, and represents a non-optimal way of dealing with data transfers. (It could be the easiest and best hack to be made, I can't say. In any case, Linus is acting like a git with his name calling here.)
Perhaps more consideration should be given to the API instead. Using zero copy is obviously a good goal, and it is primarily hindered by the ancient API and protocols. Something where the buffer management is explicit, and the devices themselves actually own the them. (After all, they are the only entities which know what the buffer requirements are.) Arranging it so that the user applications have access to the actual network buffers would be far preferable to playing any of these "games".
Unfortunately, Ethernet and the IP protocols are not particularly conducive to such an optimal implementation. With enough intelligence in the network adapters though, many of the issues should be manageable, and allow for a good zero copy implementation with a suitable API. It may be more trouble for the application, but if you need the performance, it is a small price to pay.
The dispute is not about fork(). It is about techniques to avoid copying the contents of I/O buffers from user space to kernel space - aka "zero copy" writes.
Linus (minus the ad hominem characterizations) is arguing that the FreeBSD method of VM based copy on write is a poor performer under real world loads, due to the cost of handling the page faults.
He says that an effective zero copy I/O system requires more explicit coordination between the application and the kernel.
...what COW means.
May be people like myself should just stay away from this thread...
I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
Andy went out and said that he thought the Linux approach was wrong, and archaic, and that people should go and wait for GNU.
Linus said that he felt this was wrong, and that being a prof is no excuse for Minix being the mess it was (and Minix was a mess in the late 1980s/early 1990s). He also apologized if he came off as too harsh for his writing about how people should be able to throw away an old design in favour of a new one anyway, etc.
It was very polite compared to some of the non-Andy/Linux replies.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
There seem to be a LOT of misconceptions about the discussion of vmslice() vs COW vs copy. This has nothing to do with conserving memory and everything to do with high performance I/O. If your app just needs to send a couple small files from A to B, you probably don't care about this at all.
A little background is needed on the terminology and mechanisms of I/O for any of this to make sense. For an example, let's say your app is a very busy web server sending dynamic (but trivial to compute) pages out.
The oldest and simplest method is copy. The app calls write(int sock, char *buffer, int length) on a socket. The kernel coppies the contents of buffer from userspace memory into a kernel space buffer and at least queues the data to the TCP stack before returning.
COW is an attempt to avoid the cost of copying the outgoing data.. In that case, the reference count on the physical pages that make up buffer is bumped up (since now kernel and application are both interested in them), and marks the pages as COW. That is, the virtual memory addresses are set as read only and a flag bit is set (more or less). The latter is done so the kernel needn't worry about them again. By the time the write call returns, the app is able to immediatly write to that memory (sorta) without worry.
When that write happens, the app takes a page fault (writing to a read-only page). The kernel sees that the pages are COW, copies the data to a new physical page, and maps the page in read/write. Then it returns from the fault. OTOH, if the kernel finished with the page first (the data goes on to the wire), it re-marks the page(s) so the app can access them without a copy.
The hope is that often enough, the app WON'T try to write to the pages while they're busy and so the cost of that copy is saved. If that hope comes through often enough it MIGHT be vaguely uesful. I say MIGHT since there is a significant cost just for marking the pages (the CPU's TLB must be flushed for the change to take effect). If the faults happen, it's a BIG loss since handling a fault takes thousands of CPU cycles.
So, for it to have any chance to help, the application programmer must already know enough to TRY to avoid writing to the same buffer again until it gets to the wire. Unfortunatly, it can never be sure so most apps don't bother.
The vmsplice() proposal is fairly simple. In this case, the app explicitly requests special treatment of the write. The pages are NOT marked as read only at all. Instead, the app is on it's honor to leave them alone until the kernel notifies it that they are again available. This saves the copy and the costs of TLB flush AND the (potential) cost of page faults. If the app breaks it's promise, it is the only one to suffer as the data it sent is corrupted (no kernel housekeeping is ever stored in such pages so there are no security implications). Any damage the app might do by sending screwy data could also be done using the old copy method.
What it all comes down to is that playing tricks with page mapping LOOKS nice at first glance since it SEEMS reasonable that not copying bytes around will save CPU cycles and memory bandwidth. The re-mapping (or just permission changes) on pages SEEMS lightweight. Unfortunatly, in fact, re-mapping or changing permission forces cache invalidations and page faults are just plain expensive. With the direction CPU design is going, these things will likely get more expensive rather than less (as they have for most of the history of microprocessor design).
It's really not that complex for an application to use. At least in comparison to the complexities and level of knowledge required to write an app that performs well enough to need this in the first place.
-=- ThE DaRK MaN oF tHe ObScURiTY -=-
Thank you. I've read more than 30 high-modded posts in this article, and yours is the best explanation of the issue by far.
So the big question is, what happens if user mode breaks the promise, either intentionally or through lousy programming? If the program fucks up, well, then, I'd rather have FreeBSD's model (actually, I'd rather have someone come up with a thread-safe wrapper function, and keep I/O the way it's supposed ot be, i.e., atomic).
I'm sure someone said the same thing about the total size of segmented ICMP packets.