Slashdot Mirror


Torvalds Has Harsh Words For FreeBSD Devs

An anonymous reader writes "In a relatively technical discussion about the merits of Copy On Write (COW) versus a very new Linux kernel system call named vmsplice(), Linux creator Linus Torvalds had some harsh words for Mach and FreeBSD developers that utilize COW: 'I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games, and bigger caches will only continue to drive that point home.' The discussion goes on to explain how the new vmsplice() avoids this extra overhead."

17 of 571 comments (clear)

  1. Wrong Side of Bed? by AKAImBatman · · Score: 5, Insightful
    Ok, let me see if I've got this straight:

    • Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.
    • Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.
    • Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.
    • Linus doesn't like "VM Games" despite the fact that Virtual Memory, Memory Mapped Files, Disk I/O, Write Caching, etc, etc, etc, are all already "Memory Games" and "VM Games"


    Do I have that right?

    If so, I'm not really seeing his issue. Or at least not as hard-line as he sees it. The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

    I'm just not sure that Torvalds is really looking at all sides of this. He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

    Links:

    Copy on Write as explained by Wikipedia
    FreeBSD page on Zero Copy Patches
    Duke Uni Research
    1. Re:Wrong Side of Bed? by qwijibo · · Score: 5, Insightful

      I don't consider myself an expert in kernel programming, but I definitely think someone is off base if they're expecting programmers as a whole to do the right thing. Many programs seem to work by coincidence rather than design. People didn't do their memory management right in the days when it was necessary. Now that a lot of people are moving towards languages that handle the memory management for them, I expect even fewer to worry about it. That does mean that the programmers of the programming languages are the ones who are responsible, but I'd personally rather have the kernel take a more active role in memory management.

    2. Re:Wrong Side of Bed? by mrsbrisby · · Score: 5, Informative

      Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.

      No. Updating the page tables twice and having a fault in there is very expensive.

      Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.

      And he's right too. But he's not recommending the copy "in the first place" - he's recommending explicit notification that the pages aren't used anymore instead of an implicit notification by-way of a page fault.

      Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.

      That's correct.

      Does the exception generated really cost that much more

      Yes. There isn't a grey area on it either- it's basic math: cost of page copy + exception + 2 * (page table update) is greater than cost of page copy + page table update.

      The real issue is that the userland knows what it's doing. Eventually it'll want to reuse a buffer. Now does the userland start reusing pages when malloc() fails- thus incuring the exceptions when memory is tight? Or does it reuse them when the kernel says they're reusable?

      The latter makes more sense if you're actually concerned about performance. The former may be easier to code, but I doubt many people will actually do that because it's hard to test.

      In practice what people do is use a static buffer- that's even EASIER to code, but it means page faults happen ALL the time.

      Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

      They already have to do it. Whether it's the BSD implementation or the new Linux implementation they already have to do it if they want reasonable performance in the real world.

      To really take advantage of the BSD implementation, your program needs to monitor malloc() usage, and start attempting to reuse pages when it fails- oldest to newest. This is complicated and hard to test.

      To really take advantage of the Linux implementation, your program waits until it gets notification (via select() or poll()) on the vmsplice() recvmsg() operation. Once that occurs, the notification says exactly which pages can be used.

      The result? Userland on Linux is easier to write, and easier to test. It'll also be faster.

    3. Re:Wrong Side of Bed? by visgoth · · Score: 5, Funny

      Because everyone enjoys a good old fashioned jihad once in a while?

      --
      My patience is infinite, my time is not.
    4. Re:Wrong Side of Bed? by Nato_Uno · · Score: 5, Insightful

      He doesn't care what the FreeBSD developers are doing... ... until someone advocates copying their ideas into the Linux kernel. Then he cares very much.

      He's not saying "The FreeBSD people should rewrite that part of their OS," he's saying "don't put that crap into the Linux kernel."

      --

      Have fun,

      Nathan 'Nato' Uno
      http://web.unos.net/
    5. Re:Wrong Side of Bed? by LordNimon · · Score: 5, Informative
      I don't consider myself an expert in kernel programming, but I definitely think someone is off base if they're expecting programmers as a whole to do the right thing.

      Well, I am an expert in kernel programming, and I can tell you that Linus has little tolerance for anyone who doesn't program the way he does. That's one reason, for example, that he doesn't support debuggers. Every other OS has a kernel debugger built-in (and therefore, generally stable and full-featured), but not Linux. Even the OS/2 kernel debugger that was created 10 years ago is better than anything Linux has.

      --
      And the men who hold high places must be the ones who start
      To mold a new reality... closer to the heart
    6. Re:Wrong Side of Bed? by mrsbrisby · · Score: 5, Informative

      What you're saying is that every time through the loop, there's going to be a page fault as the CoW pages are wiped away by the new copy into the same logical buffer. CoW is dependent on allocating new pages every time so that you don't ever write to the old CoW pages. Correct?

      Exactly correct. Those frequent CoW operations are slow- the page faults are expensive. If you had instead written:

              char *buffer;
              int read = 0;
              int length;

              while(read < totalSize)
              {
                      buffer = malloc(1024);
                      length = fread(buffer, 1, 1024, &file);
                      read += length; //Do some stuff, but don't free the buffer!
              }


      Then it would operate quickly on FreeBSD. The problem then becomes exactly when do you free all those malloc()s?

      On Linux, you can get a signal from the kernel- via a recvmsg() call that will tell you exactly which pages are now available to be freed- or better still, reused.

      It'll be easy to check and test correctness AND the programmer has to be aware it's going on in order to use it at all.

      Under FreeBSD the programmer can use the syscall, but never get the performance unless they know exactly what's going on.

      Of course, this is where I'd really like to hear from the *BSD developers. Surely they must be aware of this issue?

      I don't know. The article wasn't about that- I doubt Linus pays attention to what the BSD people know- in fact, I don't even think he knows for certain if FreeBSD even works this way. :)

      The point is that using CoW is stupid for this. It makes things complicated in the hard case, and in the easy case, it makes things slower.

    7. Re:Wrong Side of Bed? by Jherek+Carnelian · · Score: 5, Informative

      When I need to fork(), I do not have the time to think of all the memory management invovled with fork().

      This has NOTHING to do with fork(). You are used to CoW (copy-on-write for anyone else reading along) only applying to fork(), but that is not the issue under discussion at all. You, and probably 95% of the responders here, need to go RTFA.

      The issue is implementing zero-copy IO. FreeBSD's way of doing it do a setsockopt() that causes any write() on that socket to mark the buffer CoW so that it can use it exclusively for handing down to the device driver. The "magic" is that if the programmer tries to use that buffer while the device driver owns it he will get a copy. BUT, the programmer has no way of knowing when that buffer is available again.

      Linus's point is that marking a page CoW is very expensive - especially in an SMP environment, almost as expensive as just copying that page to begin with would be. He also argues that taking a page-fault to invoke the CoW to a new page, or simply to turn off the CoW attribute, is orders of magnitude more expensive than just copying it in the first place.

      So that means the CoW for sockets is only really useful if you rarely or never reuse your buffers again. And the only place that happens is in synthetic benchmarks.

      If Linus had said "Microsoft is a bunch of idiots for implementing a feature that only looks good on benchmarks" everybody would be nodding their heads in agreement. I think the reason people are not doing the same here is because they just don't understand the details.

    8. Re:Wrong Side of Bed? by outZider · · Score: 5, Insightful

      Here's my -1, Troll.

      Funny that we just had an article about how many Linux users and enthusiasts exclude other people by being complete dicks, and here you are, acting like a dick. Of course, I don't know you from Joe Blow, so maybe I just misunderstood your obviously angsty response.

      "That's obvious."
      "Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD."

      Linux acts exactly how I'd expect, too. It completely sucks when it comes to memory and process management. Linux may have a better threading kernel, but that's the only thing that seems to save it in the real world. After only six years of administering servers professionally and for my own use, it has come down to Linux on the desktop, and FreeBSD for Real Work(tm). Many large companies that depend on their data agree with me, and those who use Linux or Windows just throw more machines at the problem.

      At least Linux is free compared to Windows, right?

      --
      - oZ
      // i am here.
  2. Given the respective quality of the Linux and *BSD by Anonymous Coward · · Score: 5, Funny

    kernels, me thinks it's just sour grapes because Linus can't compete in that area.

  3. Wrong side of compiler by StarKruzr · · Score: 5, Funny

    I think Linus has gotten to the point where he just really enjoys trolling. Like, this was OBVIOUSLY uncalled-for, and he's usually such a laid-back guy. Maybe's he's read too much Slashdot. I don't know.

    --

    +++ATH0
    1. Re:Wrong side of compiler by nuzak · · Score: 5, Interesting

      Actually he's been into boorish behavior from day 1 when it comes to microkernels. Namecalling between him and Tanenbaum (admittedly Tanenbaum is a bit haughty and provoking), and his slanderous accusations against microkernel researchers in general (a quote I can't find at the moment, but he basically accuses them all, as one big class, of academic fraud to procure grant money).

      The only microkernel Linus knows jack about is Mach, an ancient piece of crap, which indeed is Linus indeed calls it. It's unfortunate real-world systems were saddled with it, and it's got real performance issues, but Linus carries on about it like Mach ran over his dog or something.

      He conveniently ignores or chooses to remain ignorant of the fact that L4Linux is typically faster than Linux itself. To say nothing of the real-world success of QNX. And even L4Linux is pretty old by today's standards.

      This is all pretty typical behavior of Linus: bluster now, bone up and learn, and implement it later. He did so with SMP (saying famously that the way to do it was one Big F**ing Lock, then learning that no this wasn't such a great idea after all). Then he went on a tirade about sun's /dev/poll before learning that yes they actually didn't cheat and they did it smarter, and Linux followed.

      Ultimately, Linus and Linux come around. Sometimes he just has to vent.

      --
      Done with slashdot, done with nerds, getting a life.
    2. Re:Wrong side of compiler by arivanov · · Score: 5, Insightful

      More likely he had some really bad acid the previous night.

      After all he did more than 6 revisions of the Linux VM using CopyOnWrite before this latest fad.

      Possibly more.

      Off the top of my head that is at least 1 in the 1.2 tree, 1 in the 2.0 tree, 1 in the 2.2 tree, 2 in the 2.4 tree and more than 2 in the 2.6 tree, all of which being CopyOnWrite and at least some of which has been hailed as the next best thing after hot bread.

      As far as the technical point he is possibly correct for x86 where COW goes through the fault mechanism and causes some TLB and cache abuse which is really bad on modern CPUs. I am not sure as far as other architectures are concerned, because IIRC (I may be wrong) the memory mapper hardware on the old Sparc was designed for COW in first place.

      Anyway, before calling somebody else an idiot for something you have happily done for 10+ years till yesterday it may be nice if you look at yourself in the mirror. Because I never remember any branch of FreeBSD reaching the point where you can do a find /usr -exec cat {} > /dev/null \; to hang the system. That is 2.6.16 at your service (from rc4 onward) on at least two x86 subarchitectures where I had the time to test it. That is besides the unkillable processes in [S] state on an nfs flock in 2.6.14 (yep, that is a gem which no other unix has managed so far), besides the OOM idiocies in 2.6.10, besides deliberately making it absolutely impossible to backtrack any more interesting patch to a previous kernel without employing a team of kernel developers because the VM and locking is not compatible across any kernel version since 2.6.9 and even when it is something else is changed like the tty layer, besides.... Aarghh.....

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    3. Re:Wrong side of compiler by bfields · · Score: 5, Funny
      I think Linus has gotten to the point where he just really enjoys trolling.
      Could be:
      I got slashdotted! Yay!

      On Thu, 20 Apr 2006, Linus Torvalds wrote:
      >
      > I claim that Mach people (and apparently FreeBSD) are incompetent idiots.

      I also claim that Slashdot people usually are smelly and eat their
      boogers, and have an IQ slightly lower than my daughters pet hamster
      (that's "hamster" without a "p", btw, for any slashdot posters out
      there. Try to follow me, ok?).

      Furthermore, I claim that anybody that hasn't noticed by now that I'm an
      opinionated bastard, and that "impolite" is my middle name, is lacking a
      few clues.

      Finally, it's clear that I'm not only the smartest person around, I'm also
      incredibly good-looking, and that my infallible charm is also second only
      to my becoming modesty.

      So there. Just to clarify.

      Linus "bow down before me, you scum" Torvalds
  4. Re:Linus is turning into a dictator by Lumpy · · Score: 5, Insightful

    No he is simply getting less tolerant of "sloppy" programming. He is one of the very very few that believes in doing it the way that gives you the best speed. Something that takes 4+ operations compared to a way of doing it with only 2 operations and you get less problems = performance gains that add up. Just because your typical machine has 4 dual core 8Ghz processors and 22 terabytes of ram does not mean you can slack off and write the whole thing without paying attention to performance.

    the BSD guys have their reasoning and if you read more info about this it is not a shot in the dark that Linus is taking but he is frustrated that after many discussions nobody cares as much as he does on the performance issues.

    Go back and read what Linus did back in the early days, it's no different today than what it was in 1990, he will call a duck a duck.

    --
    Do not look at laser with remaining good eye.
  5. RTFA, please. Or at least my summary here. by ColonelPanic · · Score: 5, Informative

    The complaint is not about general copy-on-write, it's about BSD's ZERO_COPY_SOCKET feature vs. vmsplice().

    Basic explanation: Suppose that a program is doing a lot of output to a file or socket. The program can generate data faster
    than the kernel can consume it, say. So what should the kernel do with the buffer it receives from the user on each write()?
    There are three options.

    1) Copy its content immediately elsewhere, so that on return to User Mode, the buffer remains writable and writes are safe.

    2) Change the access rights of the page containing the buffer, so that no copy need be made unless User Mode attempts
          to modify its content before the kernel has completed the write(). If the user attempts to write, it either gets
          permission to do so (because the kernel is done) or it gets a writable copy.

    3) Let User Mode promise to not modify the buffer's content until told that it's safe to do so, leaving it writable in
          the meantime.

    The default behavior is (1); BSD's zero copy socket feature is (2), and the point of Torvalds' complaint; vmsplice() is (3).

    --
    "Skill shows through where genius wears thin." -Wittgenstein || Religion: uniting aviation and architecture.
  6. What harsh words? by Inoshiro · · Score: 5, Insightful

    Andy went out and said that he thought the Linux approach was wrong, and archaic, and that people should go and wait for GNU.

    Linus said that he felt this was wrong, and that being a prof is no excuse for Minix being the mess it was (and Minix was a mess in the late 1980s/early 1990s). He also apologized if he came off as too harsh for his writing about how people should be able to throw away an old design in favour of a new one anyway, etc.

    It was very polite compared to some of the non-Andy/Linux replies.

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.