Slashdot Mirror


Torvalds Has Harsh Words For FreeBSD Devs

An anonymous reader writes "In a relatively technical discussion about the merits of Copy On Write (COW) versus a very new Linux kernel system call named vmsplice(), Linux creator Linus Torvalds had some harsh words for Mach and FreeBSD developers that utilize COW: 'I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games, and bigger caches will only continue to drive that point home.' The discussion goes on to explain how the new vmsplice() avoids this extra overhead."

94 of 571 comments (clear)

  1. Wrong Side of Bed? by AKAImBatman · · Score: 5, Insightful
    Ok, let me see if I've got this straight:

    • Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.
    • Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.
    • Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.
    • Linus doesn't like "VM Games" despite the fact that Virtual Memory, Memory Mapped Files, Disk I/O, Write Caching, etc, etc, etc, are all already "Memory Games" and "VM Games"


    Do I have that right?

    If so, I'm not really seeing his issue. Or at least not as hard-line as he sees it. The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

    I'm just not sure that Torvalds is really looking at all sides of this. He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

    Links:

    Copy on Write as explained by Wikipedia
    FreeBSD page on Zero Copy Patches
    Duke Uni Research
    1. Re:Wrong Side of Bed? by qwijibo · · Score: 5, Insightful

      I don't consider myself an expert in kernel programming, but I definitely think someone is off base if they're expecting programmers as a whole to do the right thing. Many programs seem to work by coincidence rather than design. People didn't do their memory management right in the days when it was necessary. Now that a lot of people are moving towards languages that handle the memory management for them, I expect even fewer to worry about it. That does mean that the programmers of the programming languages are the ones who are responsible, but I'd personally rather have the kernel take a more active role in memory management.

    2. Re:Wrong Side of Bed? by jandrese · · Score: 3, Funny

      One thing that concerns me about making all of these copies is that it seems like a quick and easy way to blow out your L2 cache. That could in the long run have a worse performance penalty than having to play the VM tricks with CoW.

      --

      I read the internet for the articles.
    3. Re:Wrong Side of Bed? by mrsbrisby · · Score: 5, Informative

      Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.

      No. Updating the page tables twice and having a fault in there is very expensive.

      Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.

      And he's right too. But he's not recommending the copy "in the first place" - he's recommending explicit notification that the pages aren't used anymore instead of an implicit notification by-way of a page fault.

      Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.

      That's correct.

      Does the exception generated really cost that much more

      Yes. There isn't a grey area on it either- it's basic math: cost of page copy + exception + 2 * (page table update) is greater than cost of page copy + page table update.

      The real issue is that the userland knows what it's doing. Eventually it'll want to reuse a buffer. Now does the userland start reusing pages when malloc() fails- thus incuring the exceptions when memory is tight? Or does it reuse them when the kernel says they're reusable?

      The latter makes more sense if you're actually concerned about performance. The former may be easier to code, but I doubt many people will actually do that because it's hard to test.

      In practice what people do is use a static buffer- that's even EASIER to code, but it means page faults happen ALL the time.

      Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

      They already have to do it. Whether it's the BSD implementation or the new Linux implementation they already have to do it if they want reasonable performance in the real world.

      To really take advantage of the BSD implementation, your program needs to monitor malloc() usage, and start attempting to reuse pages when it fails- oldest to newest. This is complicated and hard to test.

      To really take advantage of the Linux implementation, your program waits until it gets notification (via select() or poll()) on the vmsplice() recvmsg() operation. Once that occurs, the notification says exactly which pages can be used.

      The result? Userland on Linux is easier to write, and easier to test. It'll also be faster.

    4. Re:Wrong Side of Bed? by QuietLagoon · · Score: 3, Funny
      Why does Torvalds care? I mean, why does he care what FreeBSD is doing, or how they are doing it?

      If there is something that FreeBSD does that he likes, he is welcome to the code. If there is something that FreeBSD does that he does not like, he can just let it go.

      Why does he feel the need to start a war within the OpenSource community?

    5. Re:Wrong Side of Bed? by RailGunner · · Score: 4, Interesting
      Yes, you've got it right on what Linus is saying.

      The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

      What programs weigh in at hundreds of megs? Don't count data files or map files for games. The entire bin directory of a PostgreSQL install is only 20 megs, and that's a lot of stuff there.

      And as far as doing memory management... YES. I have yet to see a compiler do a better job at managing memory than what I can do when I write my code - and the reason is quite simple: I'm the domain expert, not the compiler. Compilers generally do a good job, but it's those specific cases that bite you over and over again.

      Linus is also right about child threads writing to memory. If that never happened, we wouldn't have a concept of a lock or a semaphore. The bottom line is that is happens a lot.

      He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

      I agree, the ad hominem was completely unecessary.

    6. Re:Wrong Side of Bed? by mrsbrisby · · Score: 4, Insightful

      One thing that concerns me about making all of these copies is that it seems like a quick and easy way to blow out your L2 cache. That could in the long run have a worse performance penalty than having to play the VM tricks with CoW.

      No it won't. The only way to avoid the copies is to avoid the pagefaults. Since userland doesn't get explicit notification in FreeBSD of when the pages are safe to use, the process should wait as long as possible (e.g. until malloc() starts failing)

      The idea Linus pushes here is explicit notification- via select() or poll() and returnable via recvmsg(). That way the userland knows exactly which pages can be reused.

      The result is that it's faster and easier to develop userland programs to take advantage of it. It's also easier to degrade gracefully into read()/write() until the FreeBSD people see the light and add support for this too.

      It's really a clever idea.

    7. Re:Wrong Side of Bed? by Peaker · · Score: 2, Interesting

      Ok, let me see if I've got this straight:

      Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.
      Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.
      Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.
      Linus doesn't like "VM Games" despite the fact that Virtual Memory, Memory Mapped Files, Disk I/O, Write Caching, etc, etc, etc, are all already "Memory Games" and "VM Games"

      Do I have that right?


      I do not know the context of the current debate, but after reading some of it, it seems it doesn't have anything to do with fork at all. I believe everyone agrees COW for fork() is good.

      The disagreement is about a specific optimized implementation of data transfer. Linus says that a simple non-optimized and portable interface already exists. The debate is on the optimized, less portable, high-performance implementation. Linus says it is pointless to use COW in the high-performance implementation, and that makes sense. For this specific issue, it is faster to just explicitly disallow the user from modifying his buffer after "sending" it. If the user wants a more friendly interface, and give up some performance (as COW would), he can just use the friendly low-performance interface.

      If so, I'm not really seeing his issue. Or at least not as hard-line as he sees it. The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

      Explicitly disallowing "touching" of the buffer you "sent" until you have some ACK that means it completed sending, has little to do with the size of the program (given that it is sanely modular) and is the only way to extract the best performance of the machine. Again, you can always revert to using the simple low-performance send calls that allow you to touch the buffer after sending.

      I'm just not sure that Torvalds is really looking at all sides of this. He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

      Slashdot obviously brings the short words without any context.

      Linus is not saying COW is bad, he says COW for this specific purpose in this specific context is bad. I don't know the context, and I only read the article itself and put only a little thought into it, but so far it makes sense.

    8. Re:Wrong Side of Bed? by Anonymovs+Coward · · Score: 3, Interesting

      I'm not an expert on any of this, but what I do know is that when you start using up a lot of memory Linux totally sucks. On a 256 MB RAM machine, with about twice that amount of swap, if I run over 50% memory usage the system becomes unusable for long periods of time. Even at much greater loads, FreeBSD just feels slightly sluggish at worst. This has been true for years. It was the main reason many people I know refused to use linux (they went for either commercial Unix or the BSDs). It's still true with 2.6.15 -- I'm experiencing it on my work machine as I type this.

    9. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Interesting

      I think the problem with this approach is that COW will only give you a copy of the particular piece of the memory that you accessed. That means that the system has to keep huge tables of what is shared and what is not and every time you make a call to request ANY memory it's going to need to check the table. This action is going to result in an overall performance degradation since the application has to check the table for every write over the long-haul, rather than just duplicate the memory and go.

      It does all those things anyway. The problem is that faults are expensive, and yes- and because they happen in real life, copying the memory IS faster in real life.

      It is possible to exploit the mechanism FreeBSD uses to gain performance- Simply never touch a page after it's been sent out. Or rather, wait as long as possible- say until malloc() fails.

      This would work, but it'd be hard to test and hard to get right.

      What Linus suggests is explicit notification- say a select() or poll() operation that says "these pages are now free". This works out well, and is indeed faster because there aren't any copies or page faults. It's also easier to develop.

      Of course, using COW for TCP buffers is stupid. That's why people don't use them on FreeBSD (at least, not once they've seen the profiler results)- it's never faster. They always use a static buffer and ALWAYS get the page fault when the system is under any load.

    10. Re:Wrong Side of Bed? by visgoth · · Score: 5, Funny

      Because everyone enjoys a good old fashioned jihad once in a while?

      --
      My patience is infinite, my time is not.
    11. Re:Wrong Side of Bed? by AKAImBatman · · Score: 3, Interesting

      I *think* I understand what you're saying. Basically, the problem is caused by the fact that usermode code never (or rarely, depending on your platform) releases any of the memory it has allocated. Instead, it keeps reusing the same memory pools over and over again. This becomes a problem with CoW because the kernel doesn't learn about the deallocation of memory until the usermode reallocates it for another purpose. When that reallocation happens, the read-only exception is going to be triggered. Thus there's going to be a 100% occurance of exceptions on CoW pages.

      However, given that the "free()" routine is part of the OS in FreeBSD, wouldn't it make sense to create a smarter "free()" routine that would attempt to recognize and explicitly deallocate CoW pages?

    12. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Interesting

      I'm not an expert on any of this,

      That's obvious.

      but what I do know is that when you start using up a lot of memory Linux totally sucks.

      Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD.

      http://bulk.fefe.de/scalable-networking.pdf

      Hrm. Looks like FreeBSD panics under load in it's default configuration. So sad.

      Meanwhile, I have some systems that constantly run with a run-queue length above 100.0 and are still (albeit somewhat) responsive.

    13. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Informative

      There you're assuming that the page copy will be necessary. In cases where the W in COW does NOT occur, isn't COW much better?

      Well, no, it's about the same actually.

      The problem is that in the naive implementation, the page copy is always necessary. A complicated implementation (in userspace) to take advantage of COW is more complicated than with explicit notification.

    14. Re:Wrong Side of Bed? by Nato_Uno · · Score: 5, Insightful

      He doesn't care what the FreeBSD developers are doing... ... until someone advocates copying their ideas into the Linux kernel. Then he cares very much.

      He's not saying "The FreeBSD people should rewrite that part of their OS," he's saying "don't put that crap into the Linux kernel."

      --

      Have fun,

      Nathan 'Nato' Uno
      http://web.unos.net/
    15. Re:Wrong Side of Bed? by hackstraw · · Score: 3, Insightful

      One thing that concerns me about making all of these copies is that it seems like a quick and easy way to blow out your L2 cache. That could in the long run have a worse performance penalty than having to play the VM tricks with CoW.

      Right. Especially with multithreaded apps as Linus pointed out. Also the TLB misses could get expensive as well, and again the TLB misses will be more of an overhead with multithreaded apps.

      I don't believe that COW is completely evil. It exists, obviously for a reason, but I would agree with Linus on a much less harsh tone (depending on mood).

      Oh, and isn't "VM" a trick to begin with?

    16. Re:Wrong Side of Bed? by dgatwood · · Score: 2, Informative
      You pretty much have it right. I've generally disagreed with Linus about architectural issues. That's why I don't run Linux much these days....

      The biggest advantage of COW is really obvious: faster fork() performance. The fork() call requires either COW or a copy. When Linus is forced to sit there and watch for three minutes while Photoshop forks to run some simple helper (while it panic swaps to duplicate a 1.5GB address space in a machine with 2GB of RAM), we'll see if he still thinks VM tricks are bad. :-D

      COW is a trade-off between an initial performance hit and lots of smaller ones. As long as the single, big performance hit on the front isn't too large, then Linus is right. COW produces a non-zero performance hit compared to doing it all up front. On the other hand, the performance hit of not doing COW can be huge in many cases. The fork() call is just one of many. (And don't tell me that everyone should call vfork(). That isn't always an option, and even if it were, you'd just be preaching to the choir.)

      Making fork() be COW generally results in HUGE savings, since most programmers don't use vfork() for portability reasons and since 99% of fork() calls are followed immediately by an exec(). Making vfork() behave just like fork() (with COW) results in more consistent stability than just supporting the bare minimum allowed by the vfork() specification. It's a total win-win. You can't (realistically) force programmers to always conform to your ideals, but you can take advantage of COW to make performance better in the average case, albeit at a small cost in edge cases where the programmer actually did the right thing.

      Bad coding aside, though, even in cases where fork() is used without exec(), if the performance hit caused by those COW exceptions in the child process is enough to actually be a significant decrease in performance, the amount of time wasted on the initial hit copying the pages will also be sufficiently significant to be seen as a freeze. Given a choice, in modern, user-oriented computing, we generally prefer not to have the computer sitting there looking at us funny for several seconds. Thus, amortizing those seconds plus some small penalty over the life of the program is the only sensible choice except in specialized data processing environments where interactivity is no object.

      So yeah, if Linus wants Linux to forever remain a server OS, he can just keep trying to hold true to those theoretical performance ideals. For the rest of us living in the real world, the desktop is king, and amortizing performance hits across the lifetime of an application is the only mechanism that makes sense. For example, Java hanging while it does garbage collection is one of the big flaws that initially Java from being used much for significant apps. Slow launch times of large applications was one of the biggest complaints about Mac OS X before weak linking and lazy binding became commonplace. And so on.

      You just can't have a huge, sudden stall when you're dealing with non-uber-geek users. They assume the app has hung and kill it. On the desktop, interactivity is the name of the game, and if you don't play that game, you won't get very far.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    17. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Interesting

      Basically, the problem is caused by the fact that usermode code never releases any of the memory it has allocated.

      Oh no. That's the solution actually. :)

      The problem is in using a static buffer instead of allocating a buffer for each send operation. If you use a static buffer, you ALWAYS cause a fault. If you malloc() each time, you won't fault- at least until you reuse the pages later (when malloc() fails).

      However, given that the "free()" routine is part of the OS in FreeBSD

      No, it's not unfortunately. It's a library call that mucks up [s]brk() or munmap().

      Free _could_ be smart enough to avoid actually freeing the pages until notification occurred, but userspace would still need explicit notification (or just to wait for a while).

      The real issue is explicit notification versus page fault. The page fault is undesireable because it wastes time, memory, and cache. The page fault can be avoided by never reusing memory like I proposed above.

      OR the userspace can simply wait for notification that the pages are done. A signal could be used, but vmsplice() actually causes a fd to wake up that can receive the notification via the recvmsg() system call.

    18. Re:Wrong Side of Bed? by tearmeapart · · Score: 4, Interesting

      >> Linus wants to push the manual use of zero-copy memory sharing
      >> through the vmsplice() routine. He believes that the programmer
      >> will always know better than the system when to share memory.
      >
      > That's correct.

      No, that is not always correct.

      I am a C developer for a large multinational corporation that likes to make money. When I need to fork(), I do not have the time to think of all the memory management invovled with fork(). I just want it to be done reliably, and I want it to be done fast.

      If it turns out that my code runs 10% faster on FreeBSD than on Linux, than that means that the code is probably going to go on a FreeBSD system. And if FreeBSD is not an option, than I am not going to do the optimization (because CPUs cost less than my wages).
      Also: optimization never happens anyways (or at least, not properly).

      So from my perspective:
      I want the kernel to run my code as fast as possible by default.

    19. Re:Wrong Side of Bed? by AKAImBatman · · Score: 4, Informative
      If you use a static buffer, you ALWAYS cause a fault.

      * Lightbulb goes on

      Oohhhh, I see! So something like this is the problem:
      char buffer[1024];
      int read = 0;
      int length;
       
      while(read < totalSize)
      {
          length = fread(buffer, 1, 1024, &file);
          read += length;
       
      //Do some stuff, but don't free the buffer!
      }
      What you're saying is that every time through the loop, there's going to be a page fault as the CoW pages are wiped away by the new copy into the same logical buffer. CoW is dependent on allocating new pages every time so that you don't ever write to the old CoW pages. Correct?

      Of course, this is where I'd really like to hear from the *BSD developers. Surely they must be aware of this issue? Do they expect programmers to throw away their buffers, or do they have a plan?
    20. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Informative

      I'm sorry to interrupt here, Your Holiness, but instead of being snarky and flaming the BSD kid, you could've been somewhat helpful and provided an idea as to *why* that might be the case (e.g. swappiness, etc).

      Not happening. He didn't ask how to make Linux operate as he expects, better or worse, he said Linux has had a persistant problem that FreeBSD doesn't.

      I said, no way, posted someone elses' report on the subject, and pointed out something in the conclusion section.

      If he wants help making reliable servers, he can ask, and I'll probably help. But that's about the end of it.

    21. Re:Wrong Side of Bed? by Olivier+Galibert · · Score: 2, Informative

      Except the discussion wasn't about COW on fork, but COW in a zero-copy high performance userspace-kernel-device communication system. A faster write(), essentially (and write is already quite fast, TYVM).

          OG.

    22. Re:Wrong Side of Bed? by dgatwood · · Score: 2, Interesting

      What Linus suggests is explicit notification- say a select() or poll() operation that says "these pages are now free". This works out well, and is indeed faster because there aren't any copies or page faults. It's also easier to develop.

      Problem is that unless you're talking about declaring the pages "free" by storing more data in the heap info structure, declaring the pages free would require trapping into the kernel, and that is every bit as slow as the exception on most architectures, only now you're doing it more often, since you're doing it every time a page changes from free to not free.

      Even if you do this by just adding info in the heap structure, it isn't clear that the performance hit of doing so will be worth it in the average case, since most fork() calls are followed by exec() and thus zero copies actually occur, so you're optimizing for the 1% case and causing a performance hit throughout the entire execution of the 99% case.

      Even if that performance hit is nearly zero, and even if all of the programs that use fork() never call exec(), though, Linus is -still- wrong. The three possible ways this could work are:

      • COW for active pages, create new blank page in physical RAM for unused---the time (and possible paging) spent creating the new blank pages can result in a massive stall on fork, and you're still doing COW.
      • Live copy for active pages, create new blank page in physical RAM for unused---the time spent copying the live pages can result in a massive stall on fork, and your RSS just bloated dramatically for all the unused pages.
      • Live copy for active pages, map unused pages with virtual pages---you now have a trap when you access the unused pages to actually allocate physical RAM to hold them, thus are no better off than if those unused pages had been COW.
      • COW for active pages, map unused pages with virtual pages---you still have a trap when you access the unused pages to actually allocate physical RAM to hold them, so this comes out exactly the same as using COW for everything.

      I fail to see the logic in this unless you don't care about interactivity. If we are talking about relatively small process footprints, Linus is right. For large process footprints (including stack and heap), the huge lag to copy even the used pages would be unacceptably large, however.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    23. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Informative

      When I need to fork(), I do not have the time to think of all the memory management invovled with fork(). I just want it to be done reliably, and I want it to be done fast.

      So what? Who's talking about fork()?

      This is about copy-on-write of zero-copy fifos and TCP. If you don't know what the rest of us are talking about, please just say so, and we'll be happy to tell you exactly what's going on.

      Maybe you'll have something to contribute at that point, or maybe you'll just learn something.

      And if FreeBSD is not an option, than I am not going to do the optimization

      I want the kernel to run my code as fast as possible by default.

      Sounds good. Use read() and write() because those operate predictably and faster than the zero-copy method on FreeBSD.

      If scalability is important to you, investigate zero-copy methods. They aren't free- on FreeBSD you either need to wait for a competant API or use a very complicated allocator. On Linux, you already have a competant API.

    24. Re:Wrong Side of Bed? by LordNimon · · Score: 5, Informative
      I don't consider myself an expert in kernel programming, but I definitely think someone is off base if they're expecting programmers as a whole to do the right thing.

      Well, I am an expert in kernel programming, and I can tell you that Linus has little tolerance for anyone who doesn't program the way he does. That's one reason, for example, that he doesn't support debuggers. Every other OS has a kernel debugger built-in (and therefore, generally stable and full-featured), but not Linux. Even the OS/2 kernel debugger that was created 10 years ago is better than anything Linux has.

      --
      And the men who hold high places must be the ones who start
      To mold a new reality... closer to the heart
    25. Re:Wrong Side of Bed? by mrsbrisby · · Score: 5, Informative

      What you're saying is that every time through the loop, there's going to be a page fault as the CoW pages are wiped away by the new copy into the same logical buffer. CoW is dependent on allocating new pages every time so that you don't ever write to the old CoW pages. Correct?

      Exactly correct. Those frequent CoW operations are slow- the page faults are expensive. If you had instead written:

              char *buffer;
              int read = 0;
              int length;

              while(read < totalSize)
              {
                      buffer = malloc(1024);
                      length = fread(buffer, 1, 1024, &file);
                      read += length; //Do some stuff, but don't free the buffer!
              }


      Then it would operate quickly on FreeBSD. The problem then becomes exactly when do you free all those malloc()s?

      On Linux, you can get a signal from the kernel- via a recvmsg() call that will tell you exactly which pages are now available to be freed- or better still, reused.

      It'll be easy to check and test correctness AND the programmer has to be aware it's going on in order to use it at all.

      Under FreeBSD the programmer can use the syscall, but never get the performance unless they know exactly what's going on.

      Of course, this is where I'd really like to hear from the *BSD developers. Surely they must be aware of this issue?

      I don't know. The article wasn't about that- I doubt Linus pays attention to what the BSD people know- in fact, I don't even think he knows for certain if FreeBSD even works this way. :)

      The point is that using CoW is stupid for this. It makes things complicated in the hard case, and in the easy case, it makes things slower.

    26. Re:Wrong Side of Bed? by statusbar · · Score: 2, Informative

      Well, now you see Linus's point. If the buffer is being sent asynchronously after the write() call, and the user program writes to the buffer before the ethernet chip picks up the buffer via dma, then the buffer must be COW so that the ethernet chip can send the appropriate data.

      The real problem is that in a zero-copy world, write() returns before the data is sent, and in FreeBSD there is no way for the kernel to signal the user program that the write() is complete and it is safe to re-use the buffer

      --jeffk++

      --
      ipv6 is my vpn
    27. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Informative

      Problem is that unless you're talking about declaring the pages "free" by storing more data in the heap info structure, declaring the pages free would require trapping into the kernel, and that is every bit as slow as the exception on most architectures, only now you're doing it more often, since you're doing it every time a page changes from free to not free.

      No. System calls are not as slow as exceptions.

      If they are on your architecture, you're not supporting a million clients at a time on that architecture. It's unreasonable.

      And besides, the kernel can coalesce multiple free-returns, thus reducing the number of messages. After all, the pages are free once an interrupt occurs, and anything ready to go out can probably go at that point.

      Even if you do this by just adding info in the heap structure, it isn't clear that the performance hit of doing so will be worth it in the average case, since most fork() calls are followed by exec() and thus zero copies actually occur, so you're optimizing for the 1% case and causing a performance hit throughout the entire execution of the 99% case.

      You're confused. We're not talking about fork() and exec() but about COW on buffers.

      CoW on fork() and exec() is smart. An exception between the two is rare and limited to one page. When using vfork() the exception NEVER occurs.

      CoW on kernel fifo buffers or TCP socket buffers is stupid. An exception occurs at the top of each loop- it would've been faster to copy the single page each run, instead of generate a page fault to the same page over and over and over again.

    28. Re:Wrong Side of Bed? by mrsbrisby · · Score: 4, Informative
      But that's not true in general. 99% of all fork() calls are followed by exec() and the entire space gets dumped. That's why COW is a huge win in the average case. The case of an application using fork() followed by actually doing something useful is exceptionally rare outside of the server space. In fact, Apache is about the only program I can think of that ever does this.

      This isn't about fork() it's about zero copy buffers, not code and data pages in general.

      Consider a block like this:
      char buffer[4096];
      for(i = 0; i < len;) { r = read(fd, buffer, 4096); zero_write(fd2, buffer, r); i += r; }
      Now, on the whole, if zero_write() works like write() then an awful lot of copying is going on. But if zero_write() uses the buffer for kernel space as well, it's much faster (1 less copy).

      Now the trick is returning to userspace before the buffer is completely used. In FreeBSD a page fault would occur immediately during read().

      Both FreeBSD and Linux agree that you shouldn't do this. Instead something like this:
      char *buffer;
      for(i = 0; i < len;) { buffer = malloc(4096); r = read(fd, buffer, 4096); zero_write(fd2, buffer, r); i += r; }
      The trick at this point, is that elsewhere in your code, Linux can tell you when those malloc() buffers can be reused, whereas FreeBSD doesn't. It relies on the fact that you'll either make a blocking call on fd2 before you free buffer _OR_ you'll accept a page fault.

      But if you can be told when it will occur, you don't need to do either of these things, and as a result, you NEVER have to wait. This means your program will be simpler and go faster.
    29. Re:Wrong Side of Bed? by Sangui5 · · Score: 3, Informative

      Then it would operate quickly on FreeBSD. The problem then becomes exactly when do you free all those malloc()s?

      No, it'd be slower than just copying on FreeBSD too.

      while(read < totalSize){
      buffer = malloc(1024); //1024 is < pagesize!
      length = fread(buffer, 1, 1024, &file);
      read += length; //Do some stuff, but don't free the buffer!
      }

      This is where VM games really bite you in the ass, because you get false sharing. Even if you never reuse the buffer, this can cause 3 copies--each group of 4 (3.99ish) buffers will be on the same page, and therefore each call will cause a fault from the previous one.

      In theory the OS could be allow itself write & check for overlapping calls (& avoid the COW fault), but note that the read() example really isn't interesting for zero-copy unless you're using hardware TCP offloading. Zero copy is more interesting for write(). The usual case is then:

      while(){
      b = malloc();
      fill_in_buffer(b);
      write(b);
      }

      and that fill_in_buffer step *must* cause a fault if sets of buffers are on the same page. To avoid COW faults you have to be really careful that you don't accidentally write to the same page as the buffer--even indirectly by malloc updating it's inline data structures. That's pretty nasty to do--the easiest way is to allocate 8K at a time, and use a page-aligned chunk from the middle of it. Talk about a waste of memory.

    30. Re:Wrong Side of Bed? by Jherek+Carnelian · · Score: 5, Informative

      When I need to fork(), I do not have the time to think of all the memory management invovled with fork().

      This has NOTHING to do with fork(). You are used to CoW (copy-on-write for anyone else reading along) only applying to fork(), but that is not the issue under discussion at all. You, and probably 95% of the responders here, need to go RTFA.

      The issue is implementing zero-copy IO. FreeBSD's way of doing it do a setsockopt() that causes any write() on that socket to mark the buffer CoW so that it can use it exclusively for handing down to the device driver. The "magic" is that if the programmer tries to use that buffer while the device driver owns it he will get a copy. BUT, the programmer has no way of knowing when that buffer is available again.

      Linus's point is that marking a page CoW is very expensive - especially in an SMP environment, almost as expensive as just copying that page to begin with would be. He also argues that taking a page-fault to invoke the CoW to a new page, or simply to turn off the CoW attribute, is orders of magnitude more expensive than just copying it in the first place.

      So that means the CoW for sockets is only really useful if you rarely or never reuse your buffers again. And the only place that happens is in synthetic benchmarks.

      If Linus had said "Microsoft is a bunch of idiots for implementing a feature that only looks good on benchmarks" everybody would be nodding their heads in agreement. I think the reason people are not doing the same here is because they just don't understand the details.

    31. Re:Wrong Side of Bed? by outZider · · Score: 5, Insightful

      Here's my -1, Troll.

      Funny that we just had an article about how many Linux users and enthusiasts exclude other people by being complete dicks, and here you are, acting like a dick. Of course, I don't know you from Joe Blow, so maybe I just misunderstood your obviously angsty response.

      "That's obvious."
      "Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD."

      Linux acts exactly how I'd expect, too. It completely sucks when it comes to memory and process management. Linux may have a better threading kernel, but that's the only thing that seems to save it in the real world. After only six years of administering servers professionally and for my own use, it has come down to Linux on the desktop, and FreeBSD for Real Work(tm). Many large companies that depend on their data agree with me, and those who use Linux or Windows just throw more machines at the problem.

      At least Linux is free compared to Windows, right?

      --
      - oZ
      // i am here.
    32. Re:Wrong Side of Bed? by thallgren · · Score: 2, Insightful

      > it has proved to be the right decision in the long-term.

      How can you say something like this? If Linux had a debugger from the start, it could be ripped out right now if there was some gain by doing that. By not having it, you only induced developers lots of pain during the last 10-15 years, for those occasions where a debugger really are the right tool for the job.

      And yeah, I know some of Linus' theories about how to program, how he thinks asserts and invariants are bad things, I just don't agree with him.

    33. Re:Wrong Side of Bed? by Rich0 · · Score: 2, Funny

      Only if you assume you're using linux to power a web

    34. Re:Wrong Side of Bed? by Just+Some+Guy · · Score: 4, Insightful
      No BSDer has been willing to reproduce the tests, as it will only confirm what the marketplace has already decided ... Linux is the superior OS.

      ...but still inferior to Windows, right? I mean, we're only looking at number of installations, after all. Furthermore, McDonald's clearly has the best hamburger and Velveeta beats Hoffman's Super Sharp.

      I like Linux. I'm typing this on a Gentoo box. However, I'd never pretend that it's better in every single aspect than any other OS in existence. The BSD guys have a few tricks up their sleeve, and even Redmond manages to get things right on rare occasion.

      --
      Dewey, what part of this looks like authorities should be involved?
    35. Re:Wrong Side of Bed? by larry+bagina · · Score: 2, Insightful
      I think the point is should the programmer be forced to allways have COW enabled or be able to choose.

      They already can choose. Kernel threads (via clone(2)) allow you to specify what (memory, files, signal handlers, etc) is cloned.

      Why fork? Because you're going to exec*(2) another program. Otherwise, you'd usually be better off using a thread.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    36. Re:Wrong Side of Bed? by pthisis · · Score: 2, Interesting
      The vmsplice() approach that Linus is talking about is exactly that -- a call that will block until the kernel is done with the previous buffer.

      That's certainly not the impression I get from Dave Miller's commentary about splice/tee to sockets, which discusses using poll/select/more advanced methods to see when the splice has finished and comments:


      We really can't block on this, but I guess we could consider allowing
      that for really dumb applications.

      It does indeed require some smarts in the application to field the
      events, but by definition of using this splice stuff there is explicit
      knowledge in the application of what's going on.

      This is why I'm very hesitant to say "yeah, blocking on the socket is
      OK", because to be honest it's not. As long as the socket buffer
      limits haven't been reached, we really shouldn't block so the user can
      go and do more work and create more transmit data in time to keep the
      network pipe full.


      Or Linus commenting:

      Some users may even be able to take _advantage_ of the fact that the
      buffer is "in flight" _and_ mapped into user space after it has been
      submitted. You could imagine code that actually goes on modifying the
      buffer even while it's being queued for sending. Under some strange
      circumstances that may actually be useful
      --
      rage, rage against the dying of the light
  2. Playing games with VM is bad. by digitaldc · · Score: 2, Funny

    Playing games with VM is bad.

    I know, I hate it when I have to listen to 26 hang up messages in my inbox only to find out someone is playing games with me. :(

    --
    He who knows best knows how little he knows. - Thomas Jefferson
  3. Tantrums by ClamIAm · · Score: 4, Funny

    Methinks we need to start tagging "tantrum" to this type of thing.

    1. Re:Tantrums by OctoberSky · · Score: 2, Informative

      How about Dvorakesque?

  4. Given the respective quality of the Linux and *BSD by Anonymous Coward · · Score: 5, Funny

    kernels, me thinks it's just sour grapes because Linus can't compete in that area.

  5. Just finished reading "Thud!", sorry... by Anonymous Coward · · Score: 2, Funny

    Is that my COW?
    it goes "incompetent idiots."
    It is a Torvalds.
    That is not my COW.

  6. I call bullshit! by OctoberSky · · Score: 3, Funny

    As a Slashdot user there is no way in hell you have 26 messages on your phone machine. Maybe 3 messages, but thier probably your mom calling to ask you when your coming out of the basement, your friend inviting you to stand in line for tickets to the latest Sci-Fi flick and the Pizza guy confirming your order of 2 large and a 2 liter of Mt. Dew on a Friday night.

  7. Re:Did you hear that? by TrappedByMyself · · Score: 2, Insightful

    It will be interesting to see what weapon the BSD crowd will retaliate with.

    I would just prefer that their response is to release a stable system using their method.

    --

    Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
  8. Wrong side of compiler by StarKruzr · · Score: 5, Funny

    I think Linus has gotten to the point where he just really enjoys trolling. Like, this was OBVIOUSLY uncalled-for, and he's usually such a laid-back guy. Maybe's he's read too much Slashdot. I don't know.

    --

    +++ATH0
    1. Re:Wrong side of compiler by nuzak · · Score: 5, Interesting

      Actually he's been into boorish behavior from day 1 when it comes to microkernels. Namecalling between him and Tanenbaum (admittedly Tanenbaum is a bit haughty and provoking), and his slanderous accusations against microkernel researchers in general (a quote I can't find at the moment, but he basically accuses them all, as one big class, of academic fraud to procure grant money).

      The only microkernel Linus knows jack about is Mach, an ancient piece of crap, which indeed is Linus indeed calls it. It's unfortunate real-world systems were saddled with it, and it's got real performance issues, but Linus carries on about it like Mach ran over his dog or something.

      He conveniently ignores or chooses to remain ignorant of the fact that L4Linux is typically faster than Linux itself. To say nothing of the real-world success of QNX. And even L4Linux is pretty old by today's standards.

      This is all pretty typical behavior of Linus: bluster now, bone up and learn, and implement it later. He did so with SMP (saying famously that the way to do it was one Big F**ing Lock, then learning that no this wasn't such a great idea after all). Then he went on a tirade about sun's /dev/poll before learning that yes they actually didn't cheat and they did it smarter, and Linux followed.

      Ultimately, Linus and Linux come around. Sometimes he just has to vent.

      --
      Done with slashdot, done with nerds, getting a life.
    2. Re:Wrong side of compiler by nuzak · · Score: 3, Informative

      Linus was slagging off Mach long before OSX was around. OSF/1 was based on Mach. The sun doesn't really revolve around Apple.

      --
      Done with slashdot, done with nerds, getting a life.
    3. Re:Wrong side of compiler by arivanov · · Score: 5, Insightful

      More likely he had some really bad acid the previous night.

      After all he did more than 6 revisions of the Linux VM using CopyOnWrite before this latest fad.

      Possibly more.

      Off the top of my head that is at least 1 in the 1.2 tree, 1 in the 2.0 tree, 1 in the 2.2 tree, 2 in the 2.4 tree and more than 2 in the 2.6 tree, all of which being CopyOnWrite and at least some of which has been hailed as the next best thing after hot bread.

      As far as the technical point he is possibly correct for x86 where COW goes through the fault mechanism and causes some TLB and cache abuse which is really bad on modern CPUs. I am not sure as far as other architectures are concerned, because IIRC (I may be wrong) the memory mapper hardware on the old Sparc was designed for COW in first place.

      Anyway, before calling somebody else an idiot for something you have happily done for 10+ years till yesterday it may be nice if you look at yourself in the mirror. Because I never remember any branch of FreeBSD reaching the point where you can do a find /usr -exec cat {} > /dev/null \; to hang the system. That is 2.6.16 at your service (from rc4 onward) on at least two x86 subarchitectures where I had the time to test it. That is besides the unkillable processes in [S] state on an nfs flock in 2.6.14 (yep, that is a gem which no other unix has managed so far), besides the OOM idiocies in 2.6.10, besides deliberately making it absolutely impossible to backtrack any more interesting patch to a previous kernel without employing a team of kernel developers because the VM and locking is not compatible across any kernel version since 2.6.9 and even when it is something else is changed like the tty layer, besides.... Aarghh.....

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    4. Re:Wrong side of compiler by rubycodez · · Score: 2, Informative

      what the? L4Linux has to run on top another REAL kernel, usually Linux. QNX is a realtime operating system, not a general purpose desktop or server one. And where's the spicy name-calling, I see the minix way being called "brain-damaged", heheh, but maybe I missed a good personal zinger? Mach probably bugs Linus because its still being used, even for relatively (in comparison to Linux) new projects & newer commercially sold OS.

    5. Re:Wrong side of compiler by bfields · · Score: 5, Funny
      I think Linus has gotten to the point where he just really enjoys trolling.
      Could be:
      I got slashdotted! Yay!

      On Thu, 20 Apr 2006, Linus Torvalds wrote:
      >
      > I claim that Mach people (and apparently FreeBSD) are incompetent idiots.

      I also claim that Slashdot people usually are smelly and eat their
      boogers, and have an IQ slightly lower than my daughters pet hamster
      (that's "hamster" without a "p", btw, for any slashdot posters out
      there. Try to follow me, ok?).

      Furthermore, I claim that anybody that hasn't noticed by now that I'm an
      opinionated bastard, and that "impolite" is my middle name, is lacking a
      few clues.

      Finally, it's clear that I'm not only the smartest person around, I'm also
      incredibly good-looking, and that my infallible charm is also second only
      to my becoming modesty.

      So there. Just to clarify.

      Linus "bow down before me, you scum" Torvalds
    6. Re:Wrong side of compiler by Anonymous Coward · · Score: 2, Insightful

      I just don't seem what the big deal is with Linus's comment. Of course I'm an OpenBSD person, and can hardly consider a mere 'incompetent idiots' to be a serious disparagement.

    7. Re:Wrong side of compiler by WgT2 · · Score: 2, Insightful

      I'm not too surprised, as he seems to be somewhat of a visionary: he see things as he thinks they should be... and explodes when they aren't. ;)

      What I don't get is why he choose to use incompetent to describe a group of people who are not implementing something he is just now implementing himself.

    8. Re:Wrong side of compiler by kv9 · · Score: 4, Funny
      Finally, it's clear that I'm not only the smartest person around, I'm also incredibly good-looking, and that my infallible charm is also second only to my becoming modesty.

      i know what's the last book that linus read. do you?

    9. Re:Wrong side of compiler by nuzak · · Score: 4, Informative

      > what the? L4Linux has to run on top another REAL kernel, usually Linux.

      You're quite mistaken. L4Linux runs Linux in usermode on top of the L4 kernel.

      http://os.inf.tu-dresden.de/L4/LinuxOnL4/

      --
      Done with slashdot, done with nerds, getting a life.
  9. If I only had a brain by swordfish666 · · Score: 2, Funny

    This is so tech I don't even undersatnd what they are talking about yet I am very "Intellectually Curious".

    --
    I like-a do-the cha-cha.
  10. Re:Did you hear that? by arosas · · Score: 2, Insightful

    With a comment like that, I can only imagine the kind of temper tantrum that Theo de Raat will throw. I mean honestly, whatever happened to common courtesy? There's no need for such comments like "incompetent idiots". I see so many people push for the advancement of OSS, only to find that it was in vain thanks to school-yard hissy fits like this.

  11. Yeah, that's a bad idea. It's been tried. by Animats · · Score: 2, Informative
    This is an old idea, and it's been tried before. I think it was first tried by Jerry Popek at UCLA in the 1980s, and it was tried in Mach.

    The basic idea is to fake some memory to memory copying operations by using the virtual memory hardware. More specificially, the idea is that when you do a big "write", the space just written becomes read-only to the writing process, rather than being actually copied. When the write is complete, read-only mode is turned off. This eliminates one copy.

    The trouble with this is that when you manipulate the page table to do that, you have to do some cache invalidation. That usually results in cache misses, which outweigh the cost of the copy. So this usually is a lose. Linus points out that it looks good on benchmarks, because benchmarks typically aren't using data for anything and thus don't experience the cache misses.

    Actually, copying is a relatively cheap operation in modern CPUs unless the copy is huge, since most of the work is done in the caches. The mania for "zero copy" complicates systems considerably, makes them less reliable, and, in the end, usually doesn't speed up real work by much.

    Some of this mania comes from Microsoft FUD. At one time, Microsoft was claiming that an "enterprise OS" must be able to serve web pages from inside the kernel. This led to more Linux interest in "zero copy" approaches to be "competitive".

    1. Re:Yeah, that's a bad idea. It's been tried. by n8_f · · Score: 2, Insightful
      I don't know about the rest of your post, but your explanation of CoW is confusing and inaccurate:

      The basic idea is to fake some memory to memory copying operations by using the virtual memory hardware. More specificially, the idea is that when you do a big "write", the space just written becomes read-only to the writing process, rather than being actually copied. When the write is complete, read-only mode is turned off. This eliminates one copy.

      The way CoW works is that when a process copies something already in memory, the kernel has the MMU map those same memory pages to a new location in the process' address space and mark them as read-only, after which the kernel returns the address of the "copied" memory to the process. When any of the processes using that memory try to write to it, the MMU generates an exception (because the pages are marked read-only). The kernel intercepts the exception, allocates additional memory and copies the pages being written to into it, has the MMU remap that process' address space to point to those pages, and then proceeds with the write.

    2. Re:Yeah, that's a bad idea. It's been tried. by mcgroarty · · Score: 2, Informative

      n8_f is correct that the parent post is full of bunk. In addition to what n8_f said, the cost isn't cache coherency, it's the additional copy you end up doing -anyway- if the data does get modified (which is likely when CoW is used for an I/O buffer) on top of messing with the page tables. Messing with the page tables is especially expensive when you need to syncronize this across multiple processors. Given that most new systems have multiple cores, CoW loses even more benefit in any case where the CoW event is likely.

    3. Re:Yeah, that's a bad idea. It's been tried. by 0xABADC0DA · · Score: 2, Interesting
      Torvalds:
      I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games,

      I totally agree with that, I just go one step further and say that Torvalds is also a total idiot: VM games are bad, so use copying instead because that's less bad. But copying is also bad, so why it at all? Neither are good solutions.

      The problem is that linux and bsd are using "virtual memory" to protect processes from each other, but it is designed to run programs that use more memory than is available. Does it sound right that to protect one process from another you are going to use hundreds of thousands of descriptors for each 4k that all say the same thing? It's pretty stupid actually. 4k is too fine-grained for virtual memory these days as disks have grown. It's both too small and too large for process separation.

      The better solution is to use vm for virtual memory and run all code in the same memory space, but only run code that cannot access memory illegally (ie no pointer arithmetic, only references). This code could be written in Java, or libmo, or D, or maybe other 'safe' languages and run at much faster speeds than they do now as traditional linux processes. The code could be straight C that is JIT recompiled/checked to prevent illegal accesses. That's right, I claim that an average Java program would run faster in such a system than a C program does under a linux/bsd-like system.

      Linus is right, there is massive overhead from doing vm games -- like what is done in linux for instance to separate processes. Did you even wonder why you can't use more than about 80% of the physical memory simultaneously (ie walk an array of 80% physical mem size and see what happens)? That's right, the kernel is using that much as overhead and about 7% of that is page tables for *physical memory*. It takes ~1200 cycles just to enter a system call because of using vm for process separation vs maybe 5 using a single memory space. Unix kernels do not give fine-grained access to anything because it's simply not possible with process separation based on vm to do so, not in practice.
  12. Linus, my advice to you is simple... by Anonymous Coward · · Score: 4, Funny

    .. this will help you keep yourself calm.

  13. Re:Linus is turning into a dictator by kryten_nl · · Score: 2, Informative

    In the spirit of open source community development, he can't make statements like this and expect to be a role model for the open source community.

    RMS, ever heard of him?

    --
    For the perfect anti-Unix, write an OS that thinks it knows what you're doing better than you do and let it be wrong.
  14. Sweet by bogie · · Score: 4, Funny

    It's been a while since we had a huge linux vs BSD flame feast.

    I'll start.

    BSD user: Linux is a confusing mess of programs and is less stable than BSD.

    Linux user: Your still here? I thought you were dead by now?

    --
    If you wanna get rich, you know that payback is a bitch
    1. Re:Sweet by TCM · · Score: 2, Interesting

      Windows: Where do you want to go today?
      Linux: Where do you want to be tomorrow?
      BSD: Are you guys coming or what?!

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
  15. Re:Hmm, where have I heard this recently? by dow · · Score: 3, Insightful

    Are the *BSD people are nicer? Or at least more tactful?

    No. Thats why there is more than one BSD. Issues come up, and booom crash goes the fork. Pity.

  16. Re:Did you hear that? by Cid+Highwind · · Score: 2, Funny

    It will be interesting to see what weapon the BSD crowd will retaliate with.

    My guesses are they will respond something like this:

    FreeBSD: FreeBSD users will continue their campaign of random acts of elitist snobbery against Linux users.
    OpenBSD: Theo will threaten to stop work on OpenSSH unless Linus gives him $10,000 for every nasty email he sends a *BSD developer.
    NetBSD: Will stop developing the NetBSD port for Linus' microwave.

    --
    0 1 - just my two bits
  17. Linus sometimes calls people idiots by cpu_fusion · · Score: 4, Insightful

    And in other news...
    Grass is green;
    Oil is overpriced;
    Absolute power corrupts absolutely.

  18. He's just a developer by krewemaynard · · Score: 3, Insightful

    Here we go again, imposing "role model" status. Linus is just a guy. Sometimes he gets his buttons pushed, sometimes he's doing the pushing. BFD. Maybe you'd be a little pissy too if Slashdot posted a story every time you did or said something. Linus Prefers Gas-X, Says Bean-o Is For Douchebags. Who cares? (BTW, Linus didn't really say that, I made it up. Don't wanna get the Bean-o people on his case too.)

    As far as this whole VM thing goes, time and testing will tell the true story. Meanwhile, maybe we could try NOT deifying Linus (any more)?

    --
    I saw it on Slashdot, it must be true!
  19. GEEK WARS BEGIN!!! NEWS AT 11!! by Electric+Eye · · Score: 3, Funny

    Our top story tonight, uber geek Linus Torvalds unleashed a scathing indictment of some other geeks, claiming they are skating on thin ice by using Virtual Memory calls to improve performance. The words sparked outrage in the dark rooms of colleeg geek programmers from Berkley to Berlin. The angry geek mobs said they're going to launch a flame war from their computers "to teach Linus a lesson."

    In the words of George Takei "Hoooooooly geeeez!" This is news??

  20. Re:Linus is turning into a dictator by Bogtha · · Score: 3, Insightful

    Dictator? Are the FreeBSD developers somehow unable to keep their implementation now that Linus deems it stupid?

    You might feel he's being a bit of an arsehole, but that doesn't mean he's a dictator. He's not stopping anybody from doing anything, he's merely sharing his opinion of a development technique on a mailing list dedicated to discussing the development of his kernel.

    --
    Bogtha Bogtha Bogtha
  21. Just say what you mean by lymond01 · · Score: 3, Insightful

    "I claim that Mach people (and apparently FreeBSD) are incompetent idiots."

    Linus, who's becoming more outspoken as he ages, needs to find that line between anonymous forum geek and software spokesperson...and then not cross it. Calling anyone an incompetent idiot is both non-constructive if you're hoping to improve a situation, and just plain unfriendly in an area where cooperation amongst developers is so crucial (open source).

  22. Re:Linus is turning into a dictator by Lumpy · · Score: 5, Insightful

    No he is simply getting less tolerant of "sloppy" programming. He is one of the very very few that believes in doing it the way that gives you the best speed. Something that takes 4+ operations compared to a way of doing it with only 2 operations and you get less problems = performance gains that add up. Just because your typical machine has 4 dual core 8Ghz processors and 22 terabytes of ram does not mean you can slack off and write the whole thing without paying attention to performance.

    the BSD guys have their reasoning and if you read more info about this it is not a shot in the dark that Linus is taking but he is frustrated that after many discussions nobody cares as much as he does on the performance issues.

    Go back and read what Linus did back in the early days, it's no different today than what it was in 1990, he will call a duck a duck.

    --
    Do not look at laser with remaining good eye.
  23. Obligatory Simpsons reference by Admiral+Burrito · · Score: 2, Funny

    Linus Torvalds: "Don't have a COW, man!"

  24. Re:Microsofts answer to that by keithmo · · Score: 2, Funny

    No, in Longhorn, it's called COW-tipping(tm).

  25. The Universe In Which Spock Has A Beard? by tlambert · · Score: 4, Funny

    ``And in what universe is anyone who can intelligently speak about (much less code around) memory and VM management [be called] an "incompetent idiot"?''

    The Universe In Which Spock Has A Beard?

    -- Terry

  26. The best thing about being a professional . by Shohat · · Score: 3, Interesting

    Linus is a gifted engineer ,let him be rude . Aside from Linus being rude , there is no actual story here .
    I used to own restourant and also an Office supplies shop . It was quite interesting and made me some money , but I hated the fact that the most important factor in my life was pleasing(customers) or fighting(suppliers) other people . I had to constantly think what to say and how to behave .
    I am no longer a business owner , and now I work with a rather gifted bunch of engineers , and frankly it gives me great pleasure to know that neither I nor the people I work with dont really care about being polite , clean shaven well spoken or good looking . I can be rude if I want to , they can be rude if they want to , and we all get along very well .

  27. I'm more concerned about the headline... by enitime · · Score: 2
    I mean what has the world come to when the submitter fails to make fun of an acronym like COW?

    Off the top of my head:

    Linux to BSD: "Don't have a COW, Man!"
    Linus UnMOOved by COW.
    Penguin: Demon COW Dog.

  28. Lesson on Tact.. by corellon13 · · Score: 3, Insightful

    Linus, you may be right and you may be very smart, but you should try a little tact. Here's a good definition for it that I learned from a drill sergeant: "Tact is the ability to tell someone to go to hell and look forward to the trip."

    Being nice and respectful doesn't mean you can't tell it like it is.

    --
    Do what is right and let the consequence follow
  29. Re:good approach by mrsbrisby · · Score: 3, Informative

    In practice I think the FreeBSD approach probably does have speed advantages in most cases, and the fact that it's transparent to the userspace developer would seemingly be a big advantage.

    No, it has a speed advantage over read()/write() provided you are aware of exactly how it works. The fact that it's transparent to the userspace is a bad thing because it means you have code written a certain way- that nobody will ever understand why.

    Reusing the pages causes the speed benefit to go away- and in fact it'll be slower than read()/write().

    This sort of thing matters almost exclusively to people doing really deep performance tuning, and for them it's better to present a simple API with large rewards for tuning, instead of transparently doing something weird to an existing API that will break in the field without you noticing and requires really weird usage to get the best performance.

    I agree completely. Unfortunately, the FreeBSD API is inadequate. It's not faster in practice unless you do something really really weird (waste memory). The big difference is the Linux implementation gives explicit notification and the FreeBSD API doesn't.

    FreeBSD doesn't provide an API to ask if the pages are still in use. That'd probably make their approach usable- but at that point, why bother updating the page tables at that point?

    Once you're there, why bother statpage() to check to see if the page is in use? Why not have the kernel send the pages that are available via a file descriptor so you can poll() or select() on it?

    At this point, you're at the Linux implementation.

    That's it. That's why it's better.

  30. Re:Linus is turning into a dictator by rtaylor · · Score: 4, Insightful

    No he is simply getting less tolerant of "sloppy" programming.
    You'll forgive me for taking that with a grain of salt so long as memory over-commit remains the default mode of operation within Linux.

    --
    Rod Taylor
  31. He has the stones to back it up by Gothmolly · · Score: 3, Insightful

    Linus has frequently called people idiots, and ignored patches, and done stuff his own way for a very long time now. He's quite successful at it. Perhaps what most people need to realize is that he is that good, that he can. The average read-Slashdot-during-work-while-coding Slashdotter is not in his league, so decrying his adhominem attacks, or "I would do X instead" arguments just dont hold much water.

    --
    I want to delete my account but Slashdot doesn't allow it.
  32. You stupid COW! by codehead78 · · Score: 2, Funny

    You'll scream! I'll vmsplice ya, it's gonna hurt.

  33. How is this different than AIO? by pammon · · Score: 2, Informative
    Can someone please explain to me how this new proposal is different from the aio_* functions (asynchronous I/O) that appeared in FreeBSD?

    For example, aio_write() writes to the file descriptor, allows you to poll for success a la select, and tells you not to modify the buffer before it's done (but doesn't try to stop you with copy-on-write).

    This sounds exactly like what Linus wants.

  34. Re:Linus is turning into a dictator by Laser+Lou · · Score: 2, Insightful

    Linus long been called a "Benevolent Dictator for Life". I guess this supports the idea that, with all dictatorships, you get more that what you bargained for.

    --
    No data, no cry
  35. RTFA, please. Or at least my summary here. by ColonelPanic · · Score: 5, Informative

    The complaint is not about general copy-on-write, it's about BSD's ZERO_COPY_SOCKET feature vs. vmsplice().

    Basic explanation: Suppose that a program is doing a lot of output to a file or socket. The program can generate data faster
    than the kernel can consume it, say. So what should the kernel do with the buffer it receives from the user on each write()?
    There are three options.

    1) Copy its content immediately elsewhere, so that on return to User Mode, the buffer remains writable and writes are safe.

    2) Change the access rights of the page containing the buffer, so that no copy need be made unless User Mode attempts
          to modify its content before the kernel has completed the write(). If the user attempts to write, it either gets
          permission to do so (because the kernel is done) or it gets a writable copy.

    3) Let User Mode promise to not modify the buffer's content until told that it's safe to do so, leaving it writable in
          the meantime.

    The default behavior is (1); BSD's zero copy socket feature is (2), and the point of Torvalds' complaint; vmsplice() is (3).

    --
    "Skill shows through where genius wears thin." -Wittgenstein || Religion: uniting aviation and architecture.
  36. A problem of read()/write() semantics, not VM? by KonoWatakushi · · Score: 2, Insightful

    I can certainly see the value in explicit notification of page usage, but I have to wonder if this isn't attacking the problem at the wrong level. It seems that these problems are caused by the semantics of read() and write() calls, requiring data to be read/written to an arbitrarily aligned userspace buffers.

    Zero copy can definitely make things complex, and in the current implementations, the value is arguable. (and being argued...) Still, memory copies have an associated cost. While they may be better than COW with explicit notification, it is still a performance hack, and represents a non-optimal way of dealing with data transfers. (It could be the easiest and best hack to be made, I can't say. In any case, Linus is acting like a git with his name calling here.)

    Perhaps more consideration should be given to the API instead. Using zero copy is obviously a good goal, and it is primarily hindered by the ancient API and protocols. Something where the buffer management is explicit, and the devices themselves actually own the them. (After all, they are the only entities which know what the buffer requirements are.) Arranging it so that the user applications have access to the actual network buffers would be far preferable to playing any of these "games".

    Unfortunately, Ethernet and the IP protocols are not particularly conducive to such an optimal implementation. With enough intelligence in the network adapters though, many of the issues should be manageable, and allow for a good zero copy implementation with a suitable API. It may be more trouble for the application, but if you need the performance, it is a small price to pay.

  37. Not about fork by butlerm · · Score: 2, Informative

    The dispute is not about fork(). It is about techniques to avoid copying the contents of I/O buffers from user space to kernel space - aka "zero copy" writes.

    Linus (minus the ad hominem characterizations) is arguing that the FreeBSD method of VM based copy on write is a poor performer under real world loads, due to the cost of handling the page faults.

    He says that an effective zero copy I/O system requires more explicit coordination between the application and the kernel.

  38. for people like myself who has no idea by mapkinase · · Score: 2, Informative

    ...what COW means.

    May be people like myself should just stay away from this thread...

    --
    I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
  39. What harsh words? by Inoshiro · · Score: 5, Insightful

    Andy went out and said that he thought the Linux approach was wrong, and archaic, and that people should go and wait for GNU.

    Linus said that he felt this was wrong, and that being a prof is no excuse for Minix being the mess it was (and Minix was a mess in the late 1980s/early 1990s). He also apologized if he came off as too harsh for his writing about how people should be able to throw away an old design in favour of a new one anyway, etc.

    It was very polite compared to some of the non-Andy/Linux replies.

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  40. An explanation by sjames · · Score: 4, Informative

    There seem to be a LOT of misconceptions about the discussion of vmslice() vs COW vs copy. This has nothing to do with conserving memory and everything to do with high performance I/O. If your app just needs to send a couple small files from A to B, you probably don't care about this at all.

    A little background is needed on the terminology and mechanisms of I/O for any of this to make sense. For an example, let's say your app is a very busy web server sending dynamic (but trivial to compute) pages out.

    The oldest and simplest method is copy. The app calls write(int sock, char *buffer, int length) on a socket. The kernel coppies the contents of buffer from userspace memory into a kernel space buffer and at least queues the data to the TCP stack before returning.

    COW is an attempt to avoid the cost of copying the outgoing data.. In that case, the reference count on the physical pages that make up buffer is bumped up (since now kernel and application are both interested in them), and marks the pages as COW. That is, the virtual memory addresses are set as read only and a flag bit is set (more or less). The latter is done so the kernel needn't worry about them again. By the time the write call returns, the app is able to immediatly write to that memory (sorta) without worry.

    When that write happens, the app takes a page fault (writing to a read-only page). The kernel sees that the pages are COW, copies the data to a new physical page, and maps the page in read/write. Then it returns from the fault. OTOH, if the kernel finished with the page first (the data goes on to the wire), it re-marks the page(s) so the app can access them without a copy.

    The hope is that often enough, the app WON'T try to write to the pages while they're busy and so the cost of that copy is saved. If that hope comes through often enough it MIGHT be vaguely uesful. I say MIGHT since there is a significant cost just for marking the pages (the CPU's TLB must be flushed for the change to take effect). If the faults happen, it's a BIG loss since handling a fault takes thousands of CPU cycles.

    So, for it to have any chance to help, the application programmer must already know enough to TRY to avoid writing to the same buffer again until it gets to the wire. Unfortunatly, it can never be sure so most apps don't bother.

    The vmsplice() proposal is fairly simple. In this case, the app explicitly requests special treatment of the write. The pages are NOT marked as read only at all. Instead, the app is on it's honor to leave them alone until the kernel notifies it that they are again available. This saves the copy and the costs of TLB flush AND the (potential) cost of page faults. If the app breaks it's promise, it is the only one to suffer as the data it sent is corrupted (no kernel housekeeping is ever stored in such pages so there are no security implications). Any damage the app might do by sending screwy data could also be done using the old copy method.

    What it all comes down to is that playing tricks with page mapping LOOKS nice at first glance since it SEEMS reasonable that not copying bytes around will save CPU cycles and memory bandwidth. The re-mapping (or just permission changes) on pages SEEMS lightweight. Unfortunatly, in fact, re-mapping or changing permission forces cache invalidations and page faults are just plain expensive. With the direction CPU design is going, these things will likely get more expensive rather than less (as they have for most of the history of microprocessor design).

    It's really not that complex for an application to use. At least in comparison to the complexities and level of knowledge required to write an app that performs well enough to need this in the first place.

  41. It's not the only problem of TLB. by Anonymous Coward · · Score: 2, Insightful
    The COW problems:
    1. The #1 problem: many context switches to issue each miss of the i-want-write-to-the-marked-protected-page. Solution: few context switches copying more if overneed.
    2. TLB misses.
    3. L1 & L2 Cache misses.
    4. Double TLB misses, one in userspace and other in kernel space.
    5. L1 & L2 Cache misses, one in userspace and other in kernel space.
    6. Multithreading locks (mutexes, semaphores, blocking calls, non-blocking calls, ...) VERSUS NO-locks in monothreading using select/poll, non-blocking calls, ...
    7. Hardware bubbles: pipeline's misses & bubbles, big page-translation bubbles, ...

    -=- ThE DaRK MaN oF tHe ObScURiTY -=-

  42. Re:RTFA, please. Or at least my summary here. by joe_bruin · · Score: 3, Insightful

    Thank you. I've read more than 30 high-modded posts in this article, and yours is the best explanation of the issue by far.

  43. Re:RTFA, please. Or at least my summary here. by menace3society · · Score: 2, Insightful

    So the big question is, what happens if user mode breaks the promise, either intentionally or through lousy programming? If the program fucks up, well, then, I'd rather have FreeBSD's model (actually, I'd rather have someone come up with a thread-safe wrapper function, and keep I/O the way it's supposed ot be, i.e., atomic).

  44. Re:RTFA, please. Or at least my summary here. by menace3society · · Score: 2, Insightful

    I'm sure someone said the same thing about the total size of segmented ICMP packets.