Slashdot Mirror


Torvalds Has Harsh Words For FreeBSD Devs

An anonymous reader writes "In a relatively technical discussion about the merits of Copy On Write (COW) versus a very new Linux kernel system call named vmsplice(), Linux creator Linus Torvalds had some harsh words for Mach and FreeBSD developers that utilize COW: 'I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games, and bigger caches will only continue to drive that point home.' The discussion goes on to explain how the new vmsplice() avoids this extra overhead."

571 comments

  1. Wrong Side of Bed? by AKAImBatman · · Score: 5, Insightful
    Ok, let me see if I've got this straight:

    • Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.
    • Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.
    • Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.
    • Linus doesn't like "VM Games" despite the fact that Virtual Memory, Memory Mapped Files, Disk I/O, Write Caching, etc, etc, etc, are all already "Memory Games" and "VM Games"


    Do I have that right?

    If so, I'm not really seeing his issue. Or at least not as hard-line as he sees it. The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

    I'm just not sure that Torvalds is really looking at all sides of this. He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

    Links:

    Copy on Write as explained by Wikipedia
    FreeBSD page on Zero Copy Patches
    Duke Uni Research
    1. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      day AND age.

    2. Re:Wrong Side of Bed? by qwijibo · · Score: 5, Insightful

      I don't consider myself an expert in kernel programming, but I definitely think someone is off base if they're expecting programmers as a whole to do the right thing. Many programs seem to work by coincidence rather than design. People didn't do their memory management right in the days when it was necessary. Now that a lot of people are moving towards languages that handle the memory management for them, I expect even fewer to worry about it. That does mean that the programmers of the programming languages are the ones who are responsible, but I'd personally rather have the kernel take a more active role in memory management.

    3. Re:Wrong Side of Bed? by silas_moeckel · · Score: 1

      I think the point is should the programmer be forced to allways have COW enabled or be able to choose. It seems to make sence that the programmers could do some application profiling to figure this out and what bits it makes sence for. As to what should be the default that realy depends on your workload and is debatable I would hope it gets added to proc so it can be altered by process and global default at runtime.

      --
      No sir I dont like it.
    4. Re:Wrong Side of Bed? by jandrese · · Score: 3, Funny

      One thing that concerns me about making all of these copies is that it seems like a quick and easy way to blow out your L2 cache. That could in the long run have a worse performance penalty than having to play the VM tricks with CoW.

      --

      I read the internet for the articles.
    5. Re:Wrong Side of Bed? by mrsbrisby · · Score: 5, Informative

      Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.

      No. Updating the page tables twice and having a fault in there is very expensive.

      Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.

      And he's right too. But he's not recommending the copy "in the first place" - he's recommending explicit notification that the pages aren't used anymore instead of an implicit notification by-way of a page fault.

      Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.

      That's correct.

      Does the exception generated really cost that much more

      Yes. There isn't a grey area on it either- it's basic math: cost of page copy + exception + 2 * (page table update) is greater than cost of page copy + page table update.

      The real issue is that the userland knows what it's doing. Eventually it'll want to reuse a buffer. Now does the userland start reusing pages when malloc() fails- thus incuring the exceptions when memory is tight? Or does it reuse them when the kernel says they're reusable?

      The latter makes more sense if you're actually concerned about performance. The former may be easier to code, but I doubt many people will actually do that because it's hard to test.

      In practice what people do is use a static buffer- that's even EASIER to code, but it means page faults happen ALL the time.

      Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

      They already have to do it. Whether it's the BSD implementation or the new Linux implementation they already have to do it if they want reasonable performance in the real world.

      To really take advantage of the BSD implementation, your program needs to monitor malloc() usage, and start attempting to reuse pages when it fails- oldest to newest. This is complicated and hard to test.

      To really take advantage of the Linux implementation, your program waits until it gets notification (via select() or poll()) on the vmsplice() recvmsg() operation. Once that occurs, the notification says exactly which pages can be used.

      The result? Userland on Linux is easier to write, and easier to test. It'll also be faster.

    6. Re:Wrong Side of Bed? by QuietLagoon · · Score: 3, Funny
      Why does Torvalds care? I mean, why does he care what FreeBSD is doing, or how they are doing it?

      If there is something that FreeBSD does that he likes, he is welcome to the code. If there is something that FreeBSD does that he does not like, he can just let it go.

      Why does he feel the need to start a war within the OpenSource community?

    7. Re:Wrong Side of Bed? by RailGunner · · Score: 4, Interesting
      Yes, you've got it right on what Linus is saying.

      The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

      What programs weigh in at hundreds of megs? Don't count data files or map files for games. The entire bin directory of a PostgreSQL install is only 20 megs, and that's a lot of stuff there.

      And as far as doing memory management... YES. I have yet to see a compiler do a better job at managing memory than what I can do when I write my code - and the reason is quite simple: I'm the domain expert, not the compiler. Compilers generally do a good job, but it's those specific cases that bite you over and over again.

      Linus is also right about child threads writing to memory. If that never happened, we wouldn't have a concept of a lock or a semaphore. The bottom line is that is happens a lot.

      He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

      I agree, the ad hominem was completely unecessary.

    8. Re:Wrong Side of Bed? by Compholio · · Score: 1

      Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.

      I think the problem with this approach is that COW will only give you a copy of the particular piece of the memory that you accessed. That means that the system has to keep huge tables of what is shared and what is not and every time you make a call to request ANY memory it's going to need to check the table. This action is going to result in an overall performance degradation since the application has to check the table for every write over the long-haul, rather than just duplicate the memory and go.

    9. Re:Wrong Side of Bed? by mrsbrisby · · Score: 4, Insightful

      One thing that concerns me about making all of these copies is that it seems like a quick and easy way to blow out your L2 cache. That could in the long run have a worse performance penalty than having to play the VM tricks with CoW.

      No it won't. The only way to avoid the copies is to avoid the pagefaults. Since userland doesn't get explicit notification in FreeBSD of when the pages are safe to use, the process should wait as long as possible (e.g. until malloc() starts failing)

      The idea Linus pushes here is explicit notification- via select() or poll() and returnable via recvmsg(). That way the userland knows exactly which pages can be reused.

      The result is that it's faster and easier to develop userland programs to take advantage of it. It's also easier to degrade gracefully into read()/write() until the FreeBSD people see the light and add support for this too.

      It's really a clever idea.

    10. Re:Wrong Side of Bed? by Peaker · · Score: 2, Interesting

      Ok, let me see if I've got this straight:

      Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.
      Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.
      Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.
      Linus doesn't like "VM Games" despite the fact that Virtual Memory, Memory Mapped Files, Disk I/O, Write Caching, etc, etc, etc, are all already "Memory Games" and "VM Games"

      Do I have that right?


      I do not know the context of the current debate, but after reading some of it, it seems it doesn't have anything to do with fork at all. I believe everyone agrees COW for fork() is good.

      The disagreement is about a specific optimized implementation of data transfer. Linus says that a simple non-optimized and portable interface already exists. The debate is on the optimized, less portable, high-performance implementation. Linus says it is pointless to use COW in the high-performance implementation, and that makes sense. For this specific issue, it is faster to just explicitly disallow the user from modifying his buffer after "sending" it. If the user wants a more friendly interface, and give up some performance (as COW would), he can just use the friendly low-performance interface.

      If so, I'm not really seeing his issue. Or at least not as hard-line as he sees it. The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

      Explicitly disallowing "touching" of the buffer you "sent" until you have some ACK that means it completed sending, has little to do with the size of the program (given that it is sanely modular) and is the only way to extract the best performance of the machine. Again, you can always revert to using the simple low-performance send calls that allow you to touch the buffer after sending.

      I'm just not sure that Torvalds is really looking at all sides of this. He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

      Slashdot obviously brings the short words without any context.

      Linus is not saying COW is bad, he says COW for this specific purpose in this specific context is bad. I don't know the context, and I only read the article itself and put only a little thought into it, but so far it makes sense.

    11. Re:Wrong Side of Bed? by Anonymovs+Coward · · Score: 3, Interesting

      I'm not an expert on any of this, but what I do know is that when you start using up a lot of memory Linux totally sucks. On a 256 MB RAM machine, with about twice that amount of swap, if I run over 50% memory usage the system becomes unusable for long periods of time. Even at much greater loads, FreeBSD just feels slightly sluggish at worst. This has been true for years. It was the main reason many people I know refused to use linux (they went for either commercial Unix or the BSDs). It's still true with 2.6.15 -- I'm experiencing it on my work machine as I type this.

    12. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Interesting

      I think the problem with this approach is that COW will only give you a copy of the particular piece of the memory that you accessed. That means that the system has to keep huge tables of what is shared and what is not and every time you make a call to request ANY memory it's going to need to check the table. This action is going to result in an overall performance degradation since the application has to check the table for every write over the long-haul, rather than just duplicate the memory and go.

      It does all those things anyway. The problem is that faults are expensive, and yes- and because they happen in real life, copying the memory IS faster in real life.

      It is possible to exploit the mechanism FreeBSD uses to gain performance- Simply never touch a page after it's been sent out. Or rather, wait as long as possible- say until malloc() fails.

      This would work, but it'd be hard to test and hard to get right.

      What Linus suggests is explicit notification- say a select() or poll() operation that says "these pages are now free". This works out well, and is indeed faster because there aren't any copies or page faults. It's also easier to develop.

      Of course, using COW for TCP buffers is stupid. That's why people don't use them on FreeBSD (at least, not once they've seen the profiler results)- it's never faster. They always use a static buffer and ALWAYS get the page fault when the system is under any load.

    13. Re:Wrong Side of Bed? by visgoth · · Score: 5, Funny

      Because everyone enjoys a good old fashioned jihad once in a while?

      --
      My patience is infinite, my time is not.
    14. Re:Wrong Side of Bed? by baadger · · Score: 0

      What's wrong with implementing it both ways and letting the user/distro decide at compile time?

    15. Re:Wrong Side of Bed? by MagicM · · Score: 1

      Yes. There isn't a grey area on it either- it's basic math: cost of page copy + exception + 2 * (page table update) is greater than cost of page copy + page table update.

      There you're assuming that the page copy will be necessary. In cases where the W in COW does NOT occur, isn't COW much better?

      (Please don't mod me up. I have no idea what I'm talking about.)

    16. Re:Wrong Side of Bed? by AKAImBatman · · Score: 3, Interesting

      I *think* I understand what you're saying. Basically, the problem is caused by the fact that usermode code never (or rarely, depending on your platform) releases any of the memory it has allocated. Instead, it keeps reusing the same memory pools over and over again. This becomes a problem with CoW because the kernel doesn't learn about the deallocation of memory until the usermode reallocates it for another purpose. When that reallocation happens, the read-only exception is going to be triggered. Thus there's going to be a 100% occurance of exceptions on CoW pages.

      However, given that the "free()" routine is part of the OS in FreeBSD, wouldn't it make sense to create a smarter "free()" routine that would attempt to recognize and explicitly deallocate CoW pages?

    17. Re:Wrong Side of Bed? by stevesliva · · Score: 1
      What if the parent process forks a gazillion child processes which never write, and the the parent process writes to a memory location? Do you then have to allocate memory for all gazillion child processes at once? That could suck.

      How many child processes never write to memory? What exactly is it doing if not writing to memory? Does Copy on Write only save you memory if you fork a bunch of child processes that do nothing? Or is it a matter of saving the allocation of memory until the parent process is done, thus speeding up the parent process at the time of the fork? Why not then copy on parent-write or child-execution, whichever comes first?

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    18. Re:Wrong Side of Bed? by beaubell · · Score: 1

      I believe the issue here is that the TLB invalidation needs to happen on all processors in an SMP environment when changing RW status on a page when using CoW and Zero-Copy. It's stated in the forums that this overheard is far worse than simples copies.

    19. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      "I'm just not sure that Torvalds is really looking at all sides of this."

      I'm just not sure your even capable of making that claim!!

      And you are?

    20. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Interesting

      I'm not an expert on any of this,

      That's obvious.

      but what I do know is that when you start using up a lot of memory Linux totally sucks.

      Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD.

      http://bulk.fefe.de/scalable-networking.pdf

      Hrm. Looks like FreeBSD panics under load in it's default configuration. So sad.

      Meanwhile, I have some systems that constantly run with a run-queue length above 100.0 and are still (albeit somewhat) responsive.

    21. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Informative

      There you're assuming that the page copy will be necessary. In cases where the W in COW does NOT occur, isn't COW much better?

      Well, no, it's about the same actually.

      The problem is that in the naive implementation, the page copy is always necessary. A complicated implementation (in userspace) to take advantage of COW is more complicated than with explicit notification.

    22. Re:Wrong Side of Bed? by EsbenMoseHansen · · Score: 1
      Do I have that right?

      Almost, but you missed an important point, I think. It is not really obvious, but I think they are discussing a faster method for write() and it's ilk. In *that* context, he believes that the eventual page fault will outweight the initial advantage gained on making the page readonly+generating a pagefault and making it COW+evt pagefault to clear the COW flag.


      In that context, he is saying that write() is as fast as it's going to be, if you also want it safe. Unsafe, fast methods should go the vm_splice() route, whatever that is :) He says it is a 0-copy version, so presumably it is directly userspace-to-networkcard copy of some kind.


      --
      Religion is regarded by the common people as true, by the wise as false, and by rulers as useful.
    23. Re:Wrong Side of Bed? by Nato_Uno · · Score: 5, Insightful

      He doesn't care what the FreeBSD developers are doing... ... until someone advocates copying their ideas into the Linux kernel. Then he cares very much.

      He's not saying "The FreeBSD people should rewrite that part of their OS," he's saying "don't put that crap into the Linux kernel."

      --

      Have fun,

      Nathan 'Nato' Uno
      http://web.unos.net/
    24. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      VMware server.

    25. Re:Wrong Side of Bed? by hackstraw · · Score: 3, Insightful

      One thing that concerns me about making all of these copies is that it seems like a quick and easy way to blow out your L2 cache. That could in the long run have a worse performance penalty than having to play the VM tricks with CoW.

      Right. Especially with multithreaded apps as Linus pointed out. Also the TLB misses could get expensive as well, and again the TLB misses will be more of an overhead with multithreaded apps.

      I don't believe that COW is completely evil. It exists, obviously for a reason, but I would agree with Linus on a much less harsh tone (depending on mood).

      Oh, and isn't "VM" a trick to begin with?

    26. Re:Wrong Side of Bed? by dgatwood · · Score: 2, Informative
      You pretty much have it right. I've generally disagreed with Linus about architectural issues. That's why I don't run Linux much these days....

      The biggest advantage of COW is really obvious: faster fork() performance. The fork() call requires either COW or a copy. When Linus is forced to sit there and watch for three minutes while Photoshop forks to run some simple helper (while it panic swaps to duplicate a 1.5GB address space in a machine with 2GB of RAM), we'll see if he still thinks VM tricks are bad. :-D

      COW is a trade-off between an initial performance hit and lots of smaller ones. As long as the single, big performance hit on the front isn't too large, then Linus is right. COW produces a non-zero performance hit compared to doing it all up front. On the other hand, the performance hit of not doing COW can be huge in many cases. The fork() call is just one of many. (And don't tell me that everyone should call vfork(). That isn't always an option, and even if it were, you'd just be preaching to the choir.)

      Making fork() be COW generally results in HUGE savings, since most programmers don't use vfork() for portability reasons and since 99% of fork() calls are followed immediately by an exec(). Making vfork() behave just like fork() (with COW) results in more consistent stability than just supporting the bare minimum allowed by the vfork() specification. It's a total win-win. You can't (realistically) force programmers to always conform to your ideals, but you can take advantage of COW to make performance better in the average case, albeit at a small cost in edge cases where the programmer actually did the right thing.

      Bad coding aside, though, even in cases where fork() is used without exec(), if the performance hit caused by those COW exceptions in the child process is enough to actually be a significant decrease in performance, the amount of time wasted on the initial hit copying the pages will also be sufficiently significant to be seen as a freeze. Given a choice, in modern, user-oriented computing, we generally prefer not to have the computer sitting there looking at us funny for several seconds. Thus, amortizing those seconds plus some small penalty over the life of the program is the only sensible choice except in specialized data processing environments where interactivity is no object.

      So yeah, if Linus wants Linux to forever remain a server OS, he can just keep trying to hold true to those theoretical performance ideals. For the rest of us living in the real world, the desktop is king, and amortizing performance hits across the lifetime of an application is the only mechanism that makes sense. For example, Java hanging while it does garbage collection is one of the big flaws that initially Java from being used much for significant apps. Slow launch times of large applications was one of the biggest complaints about Mac OS X before weak linking and lazy binding became commonplace. And so on.

      You just can't have a huge, sudden stall when you're dealing with non-uber-geek users. They assume the app has hung and kill it. On the desktop, interactivity is the name of the game, and if you don't play that game, you won't get very far.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    27. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Interesting

      Basically, the problem is caused by the fact that usermode code never releases any of the memory it has allocated.

      Oh no. That's the solution actually. :)

      The problem is in using a static buffer instead of allocating a buffer for each send operation. If you use a static buffer, you ALWAYS cause a fault. If you malloc() each time, you won't fault- at least until you reuse the pages later (when malloc() fails).

      However, given that the "free()" routine is part of the OS in FreeBSD

      No, it's not unfortunately. It's a library call that mucks up [s]brk() or munmap().

      Free _could_ be smart enough to avoid actually freeing the pages until notification occurred, but userspace would still need explicit notification (or just to wait for a while).

      The real issue is explicit notification versus page fault. The page fault is undesireable because it wastes time, memory, and cache. The page fault can be avoided by never reusing memory like I proposed above.

      OR the userspace can simply wait for notification that the pages are done. A signal could be used, but vmsplice() actually causes a fd to wake up that can receive the notification via the recvmsg() system call.

    28. Re:Wrong Side of Bed? by Predius · · Score: 1

      Blech, you did your testing with 5.1?! Had you actually checked with anyone in the FreeBSD community at the time, or read the lists, you would have seen 5.x was still very much in the teething stage and you'd have been better off with 4.x.

      That said, it'd be interesting to rerun against 4.11 and 6.1-RC to see how they're doing.

    29. Re:Wrong Side of Bed? by tearmeapart · · Score: 4, Interesting

      >> Linus wants to push the manual use of zero-copy memory sharing
      >> through the vmsplice() routine. He believes that the programmer
      >> will always know better than the system when to share memory.
      >
      > That's correct.

      No, that is not always correct.

      I am a C developer for a large multinational corporation that likes to make money. When I need to fork(), I do not have the time to think of all the memory management invovled with fork(). I just want it to be done reliably, and I want it to be done fast.

      If it turns out that my code runs 10% faster on FreeBSD than on Linux, than that means that the code is probably going to go on a FreeBSD system. And if FreeBSD is not an option, than I am not going to do the optimization (because CPUs cost less than my wages).
      Also: optimization never happens anyways (or at least, not properly).

      So from my perspective:
      I want the kernel to run my code as fast as possible by default.

    30. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 1, Interesting

      Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD.

      I'm sorry to interrupt here, Your Holiness, but instead of being snarky and flaming the BSD kid, you could've been somewhat helpful and provided an idea as to *why* that might be the case (e.g. swappiness, etc).

      Just a suggestion. And the rap on programmers for being cranky sons of bitches is totally false.

    31. Re:Wrong Side of Bed? by powerg3 · · Score: 1
      I do not know the context of the current debate, but after reading some of it, it seems it doesn't have anything to do with fork at all. I believe everyone agrees COW for fork() is good.

      Except the vegans.

      --
      Wild Eeep!
    32. Re:Wrong Side of Bed? by dgatwood · · Score: 1
      Thus there's going to be a 100% occurance of exceptions on CoW pages.

      But that's not true in general. 99% of all fork() calls are followed by exec() and the entire space gets dumped. That's why COW is a huge win in the average case. The case of an application using fork() followed by actually doing something useful is exceptionally rare outside of the server space. In fact, Apache is about the only program I can think of that ever does this.

      However, given that the "free()" routine is part of the OS in FreeBSD, wouldn't it make sense to create a smarter "free()" routine that would attempt to recognize and explicitly deallocate CoW pages?

      No, because free() works on a much smaller than page granularity. It's very hard to turn that into anything useful unless the entire page goes free, and even then, you still have to trap into the kernel to deallocate the page, and then again to reallocate a new one.

      What would be an advantage would be if the heap had a defined structure to it which the kernel understood and could simply assign newly allocated physical pages to sections of the heap that were unused... assuming that this doesn't end up causing additional paging pressure, in which case it becomes a rather significant performance loss.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    33. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 1, Interesting

      The point of the fefe.de tests was mainly to expose the general *BSD grandious claims of "stability" and "scalability" and anti-Linux claims to be zealot lies. It has successfully done so.

      BSDers have been saying "you should run version X.Y.Z" since the tests were published, but at this point it matters not because they've already been exposed as frauds. No BSDer has been willing to reproduce the tests, as it will only confirm what the marketplace has already decided ... Linux is the superior OS.

    34. Re:Wrong Side of Bed? by Lehk228 · · Score: 1

      copying data from RAM to RAM in a modernd computer should not appear to the user as a freeze,

      --
      Snowden and Manning are heroes.
    35. Re:Wrong Side of Bed? by dgatwood · · Score: 1
      What programs weigh in at hundreds of megs? Don't count data files or map files for games. The entire bin directory of a PostgreSQL install is only 20 megs, and that's a lot of stuff there.

      You have to count the data if it is loaded into memory. When you fork(), you'll be copying that, too. Completely avoiding COW means always copying the entire VMSIZE, not just the RSS. Ideal is probably to use COW for non-resident pages since there's absolutely zero performance hit for COW in those cases, and to initially copy resident pages if that won't result in paging, else use COW, since it will probably be more efficient in the average case. That said, the latter part isn't a given by any means.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    36. Re:Wrong Side of Bed? by AKAImBatman · · Score: 4, Informative
      If you use a static buffer, you ALWAYS cause a fault.

      * Lightbulb goes on

      Oohhhh, I see! So something like this is the problem:
      char buffer[1024];
      int read = 0;
      int length;
       
      while(read < totalSize)
      {
          length = fread(buffer, 1, 1024, &file);
          read += length;
       
      //Do some stuff, but don't free the buffer!
      }
      What you're saying is that every time through the loop, there's going to be a page fault as the CoW pages are wiped away by the new copy into the same logical buffer. CoW is dependent on allocating new pages every time so that you don't ever write to the old CoW pages. Correct?

      Of course, this is where I'd really like to hear from the *BSD developers. Surely they must be aware of this issue? Do they expect programmers to throw away their buffers, or do they have a plan?
    37. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Informative

      I'm sorry to interrupt here, Your Holiness, but instead of being snarky and flaming the BSD kid, you could've been somewhat helpful and provided an idea as to *why* that might be the case (e.g. swappiness, etc).

      Not happening. He didn't ask how to make Linux operate as he expects, better or worse, he said Linux has had a persistant problem that FreeBSD doesn't.

      I said, no way, posted someone elses' report on the subject, and pointed out something in the conclusion section.

      If he wants help making reliable servers, he can ask, and I'll probably help. But that's about the end of it.

    38. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      It's unavoidable. Even on a "modernd computer", RAM isn't much faster than it was 10 years ago. Your app is probably using 100x the memory, but if the memory is only 10x faster...

      The trouble is, compared to the rest of the industry, RAM, and the people who make it, suck.

    39. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      What? you -expect- it to suck...

      Read the URL I posted troll.

      Linux beats FreeBSD in every benchmark of scalability thrown at it (in that report).

    40. Re:Wrong Side of Bed? by hackstraw · · Score: 1

      I'm not an expert on any of this, but what I do know is that when you start using up a lot of memory Linux totally sucks. On a 256 MB RAM machine, with about twice that amount of swap, if I run over 50% memory usage the system becomes unusable for long periods of time.

      Linux's paging support has never been that good for "desktop" or interactive use. Its fine for servers because dead pages just get swapped out.

      Now, I question the discrepancy here. "using up a lot of memory .. on a 256 MB RAM machine ..." is odd to me. RAM is cheap. 256 has not really been a standard entry level amount of RAM for almost 10 years now. Right now my web browser is using ~125 megs of real memory and 370 megs of virtual memory. Yes, the browser is a memory hog, but there was a time when 640k of RAM was OK for people. I don't think you can find a cellphone or MP3 player with less than a few megs of ram today.

    41. Re:Wrong Side of Bed? by quantum+bit · · Score: 1
      The problem is that in the naive implementation, the page copy is always necessary. A complicated implementation (in userspace) to take advantage of COW is more complicated than with explicit notification.

      It should come as no surprise that a naive implementation is suboptimal. That's why it's naive. Given that zero-copy sockets is not on by default in FreeBSD, and that you have to explicitly enable it if you want to use it, presumably you've read the documentation and seen the big warning about exactly this issue. From man zero_copy(9):

      The user should be careful not to overwrite buffers that have been written to the socket before the data has been freed by the kernel, and the copy-on-write mapping cleared. If a buffer is overwritten before it has been given up by the kernel, the data will be copied, and no savings in CPU utilization and memory bandwidth utilization will be realized.

      From an application standpoint, the best way to guarantee that the data has been sent out over the wire and freed by the kernel (for TCP-based sockets) is to set a socket buffer size (see the SO_SNDBUF socket option in the setsockopt(2) manual page) appropriate for the application and network environment and then make sure you have sent out twice as much data as the socket buffer size before reusing a buffer. For TCP, the send and receive socket buffer sizes generally directly correspond to the TCP window size.

      So it seems that the FreeBSD developers realized this long ago and perhaps aren't as moronic as Linus thinks.
    42. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      When Linus is forced to sit there and watch for three minutes while Photoshop forks to run some simple helper (while it panic swaps to duplicate a 1.5GB address space in a machine with 2GB of RAM), we'll see if he still thinks VM tricks are bad.

      Perhaps I've been out of the loop for too long, but since when did Photoshop run on Linux?

    43. Re:Wrong Side of Bed? by quantum+bit · · Score: 1
      However, given that the "free()" routine is part of the OS in FreeBSD

      No, it's not unfortunately. It's a library call that mucks up [s]brk() or munmap().

      In *BSD, libc is considered part of the OS. There are a lot of interfaces that are used between libc and the kernel which aren't meant for general consumption (the threading system calls for instance).

    44. Re:Wrong Side of Bed? by BasharTeg · · Score: 1

      Yes, but there's a difference between attacking the code and attacking the FreeBSD and Mach developers personally. He's not just saying their implementation is shitty, but attacking their intelligence for not agreeing with him.

      Hey Linus, you can make your point without http://en.wikipedia.org/wiki/Ad_hominem

    45. Re:Wrong Side of Bed? by Olivier+Galibert · · Score: 2, Informative

      Except the discussion wasn't about COW on fork, but COW in a zero-copy high performance userspace-kernel-device communication system. A faster write(), essentially (and write is already quite fast, TYVM).

          OG.

    46. Re:Wrong Side of Bed? by dgatwood · · Score: 2, Interesting

      What Linus suggests is explicit notification- say a select() or poll() operation that says "these pages are now free". This works out well, and is indeed faster because there aren't any copies or page faults. It's also easier to develop.

      Problem is that unless you're talking about declaring the pages "free" by storing more data in the heap info structure, declaring the pages free would require trapping into the kernel, and that is every bit as slow as the exception on most architectures, only now you're doing it more often, since you're doing it every time a page changes from free to not free.

      Even if you do this by just adding info in the heap structure, it isn't clear that the performance hit of doing so will be worth it in the average case, since most fork() calls are followed by exec() and thus zero copies actually occur, so you're optimizing for the 1% case and causing a performance hit throughout the entire execution of the 99% case.

      Even if that performance hit is nearly zero, and even if all of the programs that use fork() never call exec(), though, Linus is -still- wrong. The three possible ways this could work are:

      • COW for active pages, create new blank page in physical RAM for unused---the time (and possible paging) spent creating the new blank pages can result in a massive stall on fork, and you're still doing COW.
      • Live copy for active pages, create new blank page in physical RAM for unused---the time spent copying the live pages can result in a massive stall on fork, and your RSS just bloated dramatically for all the unused pages.
      • Live copy for active pages, map unused pages with virtual pages---you now have a trap when you access the unused pages to actually allocate physical RAM to hold them, thus are no better off than if those unused pages had been COW.
      • COW for active pages, map unused pages with virtual pages---you still have a trap when you access the unused pages to actually allocate physical RAM to hold them, so this comes out exactly the same as using COW for everything.

      I fail to see the logic in this unless you don't care about interactivity. If we are talking about relatively small process footprints, Linus is right. For large process footprints (including stack and heap), the huge lag to copy even the used pages would be unacceptably large, however.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    47. Re:Wrong Side of Bed? by Detritus · · Score: 1

      If your motherboard supports a maximum of 256 MB, adding more RAM is not "cheap". Neither is adding memory to some older systems that use uncommon types of memory.

      --
      Mea navis aericumbens anguillis abundat
    48. Re:Wrong Side of Bed? by Dahan · · Score: 0, Flamebait
      NO U

      Ah, Linux VM... where the implementation is completely changed like 100 times on the "stable" branch before they finally get it vaguely right.

    49. Re:Wrong Side of Bed? by dgatwood · · Score: 1
      *scratches head* If it's zero copy, why would there be copy on write? That's a non-sequitur.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    50. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Informative

      When I need to fork(), I do not have the time to think of all the memory management invovled with fork(). I just want it to be done reliably, and I want it to be done fast.

      So what? Who's talking about fork()?

      This is about copy-on-write of zero-copy fifos and TCP. If you don't know what the rest of us are talking about, please just say so, and we'll be happy to tell you exactly what's going on.

      Maybe you'll have something to contribute at that point, or maybe you'll just learn something.

      And if FreeBSD is not an option, than I am not going to do the optimization

      I want the kernel to run my code as fast as possible by default.

      Sounds good. Use read() and write() because those operate predictably and faster than the zero-copy method on FreeBSD.

      If scalability is important to you, investigate zero-copy methods. They aren't free- on FreeBSD you either need to wait for a competant API or use a very complicated allocator. On Linux, you already have a competant API.

    51. Re:Wrong Side of Bed? by dgatwood · · Score: 1
      It runs on a BSD.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    52. Re:Wrong Side of Bed? by LordNimon · · Score: 5, Informative
      I don't consider myself an expert in kernel programming, but I definitely think someone is off base if they're expecting programmers as a whole to do the right thing.

      Well, I am an expert in kernel programming, and I can tell you that Linus has little tolerance for anyone who doesn't program the way he does. That's one reason, for example, that he doesn't support debuggers. Every other OS has a kernel debugger built-in (and therefore, generally stable and full-featured), but not Linux. Even the OS/2 kernel debugger that was created 10 years ago is better than anything Linux has.

      --
      And the men who hold high places must be the ones who start
      To mold a new reality... closer to the heart
    53. Re:Wrong Side of Bed? by mrsbrisby · · Score: 5, Informative

      What you're saying is that every time through the loop, there's going to be a page fault as the CoW pages are wiped away by the new copy into the same logical buffer. CoW is dependent on allocating new pages every time so that you don't ever write to the old CoW pages. Correct?

      Exactly correct. Those frequent CoW operations are slow- the page faults are expensive. If you had instead written:

              char *buffer;
              int read = 0;
              int length;

              while(read < totalSize)
              {
                      buffer = malloc(1024);
                      length = fread(buffer, 1, 1024, &file);
                      read += length; //Do some stuff, but don't free the buffer!
              }


      Then it would operate quickly on FreeBSD. The problem then becomes exactly when do you free all those malloc()s?

      On Linux, you can get a signal from the kernel- via a recvmsg() call that will tell you exactly which pages are now available to be freed- or better still, reused.

      It'll be easy to check and test correctness AND the programmer has to be aware it's going on in order to use it at all.

      Under FreeBSD the programmer can use the syscall, but never get the performance unless they know exactly what's going on.

      Of course, this is where I'd really like to hear from the *BSD developers. Surely they must be aware of this issue?

      I don't know. The article wasn't about that- I doubt Linus pays attention to what the BSD people know- in fact, I don't even think he knows for certain if FreeBSD even works this way. :)

      The point is that using CoW is stupid for this. It makes things complicated in the hard case, and in the easy case, it makes things slower.

    54. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      So it seems that the FreeBSD developers realized this long ago and perhaps aren't as moronic as Linus thinks.

      Except that there's no API call that detects whether or not the buffer can be reused. Only by causing a blocking call can you make sure that before the thread can run again, that the buffer is no longer used by the kernel.

      This is stupid. Applications that need to handle hundreads of thousands of clients simply don't block.

      If there WERE an API call that can detect the page is no longer in use, then why bother COW-ing the kernel buffer? Updating the page tables at that point is unnecessary and as a result, wasted code.

      But being as how there isn't such a call, why CoW the kernel buffer anyway? Why not simply require the blocking call or luck before reusing the buffer?
      Updating those page tables is slow, and doesn't buy very much- except that it makes the naive approach get written in the first place.

    55. Re:Wrong Side of Bed? by AuMatar · · Score: 1

      10 years?

      10 years would be 1996, when I got my first ever computer. It had a massive.... 32 MB.

      Go to dell.com and you'll see their entry level systems still only sell with 256 MB of RAM. 256 MB is a perfectly reasonable test for anything except a videogame system.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    56. Re:Wrong Side of Bed? by Homestar+Breadmaker · · Score: 1

      No, "BSDers" laughed and said "gee, I wonder why your broken benchmarks that you designed for linux to look good make linux look good".

    57. Re:Wrong Side of Bed? by WindBourne · · Score: 1

      No, it will not do a copy on each iteration. It will only do it on the first write. But the real issue is that there is overhead for doing COW. Every page fault means that you must check the status. In addition, you have to handle correctly setting of it.

      The thing is that COW works well iff the overhead of handling of things (setting and check the bits) is less than the saved IO/CPU cycles that are being saved from NOT doing an initial copy. Linus is saying that it is not the case, and that the benchmarks are gear for the best cases. There is a lot to be said for simplicity by moving control up above.

      Overall, he tends to be correct. But if not, then they will add back the COW.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    58. Re:Wrong Side of Bed? by statusbar · · Score: 2, Informative

      Well, now you see Linus's point. If the buffer is being sent asynchronously after the write() call, and the user program writes to the buffer before the ethernet chip picks up the buffer via dma, then the buffer must be COW so that the ethernet chip can send the appropriate data.

      The real problem is that in a zero-copy world, write() returns before the data is sent, and in FreeBSD there is no way for the kernel to signal the user program that the write() is complete and it is safe to re-use the buffer

      --jeffk++

      --
      ipv6 is my vpn
    59. Re:Wrong Side of Bed? by mrsbrisby · · Score: 2, Informative

      Problem is that unless you're talking about declaring the pages "free" by storing more data in the heap info structure, declaring the pages free would require trapping into the kernel, and that is every bit as slow as the exception on most architectures, only now you're doing it more often, since you're doing it every time a page changes from free to not free.

      No. System calls are not as slow as exceptions.

      If they are on your architecture, you're not supporting a million clients at a time on that architecture. It's unreasonable.

      And besides, the kernel can coalesce multiple free-returns, thus reducing the number of messages. After all, the pages are free once an interrupt occurs, and anything ready to go out can probably go at that point.

      Even if you do this by just adding info in the heap structure, it isn't clear that the performance hit of doing so will be worth it in the average case, since most fork() calls are followed by exec() and thus zero copies actually occur, so you're optimizing for the 1% case and causing a performance hit throughout the entire execution of the 99% case.

      You're confused. We're not talking about fork() and exec() but about COW on buffers.

      CoW on fork() and exec() is smart. An exception between the two is rare and limited to one page. When using vfork() the exception NEVER occurs.

      CoW on kernel fifo buffers or TCP socket buffers is stupid. An exception occurs at the top of each loop- it would've been faster to copy the single page each run, instead of generate a page fault to the same page over and over and over again.

    60. Re:Wrong Side of Bed? by hackstraw · · Score: 1

      If your motherboard supports a maximum of 256 MB, adding more RAM is not "cheap".

      Maybe its the user who is cheap :)

      To be helpful, there are many webpages out there that describe vm tuning under Linux, especially "swappiness". If you have multiple drives (SCSI is better, but more expensive), you can load balance the swap space between the drives, or put the swap on the faster drive, or one that is used less for user/system IO. There are also things like the preemptive kernel that can help with interactive use, which may or may not help depending on your usage.

      Tips for computer using in 2006.

      Processors in the MHz range are usually slow nowadays.

      1 gig harddrives will not store much data.

      256 megs of ram will not run many applications.

      10base2 networks are slow for LANs, especially if large file transfers (which cannot be done from a 1 gig hd) or for network attached storage or remote graphical displays.

    61. Re:Wrong Side of Bed? by WindBourne · · Score: 1

      BTW, COW came in because of forks. Basically, when a process forks, it gets it "own" copy of the memory. But to keep thing fast, they came up with COW, which is set a cnt that tells how many processes use it (0, 1, 2...). Before each write, the flag is examined. If >1, then copy the page, set new flag to 1, and decrment old flag. Notice the overhead on each and every write; that is very expensive unless you are doing nothing but quick forks. That is not the case anymore (it actually used to be the case).

      --
      I prefer the "u" in honour as it seems to be missing these days.
    62. Re:Wrong Side of Bed? by synthespian · · Score: 1

      And as far as doing memory management... YES. I have yet to see a compiler do a better job at managing memory than what I can do when I write my code - and the reason is quite simple: I'm the domain expert, not the compiler. Compilers generally do a good job, but it's those specific cases that bite you over and over again.

      You might want to read this:

      "Pop quiz: Which language boasts faster raw allocation performance, the Java language, or C/C++? The answer may surprise you -- allocation in modern JVMs is far faster than the best performing malloc implementations. The common code path for new Object() in HotSpot 1.4.2 and later is approximately 10 machine instructions (data provided by Sun; see Resources), whereas the best performing malloc implementations in C require on average between 60 and 100 instructions per call"

      Java theory and practice: Urban performance legends, revisited

      --
      Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
    63. Re:Wrong Side of Bed? by Nato_Uno · · Score: 1

      Hey, I'm totally with you on that count - Linus is unnecessarily rude about the whole thing. For better or worse that doesn't surprise me - he's been that way from the beginning, so I expect it now.

      The only point I'm trying to make is that he doesn't really care about "what FreeBSD is doing", per se - he's being rude about what people want to put in *his* kernel. I think he couldn't care less about what FreeBSD does to *their* kernel, so long as it doesn't affect *his* kernel.

      --

      Have fun,

      Nathan 'Nato' Uno
      http://web.unos.net/
    64. Re:Wrong Side of Bed? by ivan256 · · Score: 1

      Somtimes it feels good to flame the crap out of somebody.

      Most people have a sense of humor.

    65. Re:Wrong Side of Bed? by jbolden · · Score: 1

      I agree with you (about the numbers and dates). I should mention that my 386-40 (1990) had 20 megs, and its not unreasonable to expect /.ers to spend on their hardware.

    66. Re:Wrong Side of Bed? by dubl-u · · Score: 1

      I am a C developer for a large multinational corporation that likes to make money. When I need to fork(), I do not have the time to think of all the memory management invovled with fork(). [...] I am not going to do the optimization (because CPUs cost less than my wages). Also: optimization never happens anyways (or at least, not properly).

      I'm puzzled by this. When I don't care much about optimizing, I write in high-level languages and enjoy the development speed boost they give me. When I'm doing something at a low level that's performance-critical, I write in C and try to become one with the kernel. Why would you spend a lot of time developing in C but then not bother to know what's going on with the kernel and not bother to optimize?

    67. Re:Wrong Side of Bed? by Sandor+at+the+Zoo · · Score: 1
      While that may be good for this specific problem (and I don't know if it is), I hate that approach in general.

      "Let the user decide" is usually a cop-out in UI design when the developers can't agree on the right way.

      I almost always advocate Finding the Right Way and implementing that. Then after user testing (perhaps through an actual release cycle) you look at the feedback and see if you made the right choice.

      The alternative is a billion stupid preferences saying "does this widget do this, or this?" That's laziness on the developer's part, and it makes for crappy UI and user experience.

      OK, who pushed my button? :-)

      To bring it back on topic, if, after thrashing this around in discussion, one technical approach is deemed better than the other, it would be silly to include the other as an option.

    68. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      And the fact that, these days, hypervisors make for better/easier kernel debugging than any in-kernel debugger? While Linus' reasons for rejecting a kernel debugger were bullshit... it has proved to be the right decision in the long-term.

    69. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      In *BSD, libc is considered part of the OS. There are a lot of interfaces that are used between libc and the kernel which aren't meant for general consumption (the threading system calls for instance).

      So what?

      The problem remains that if the kernel doesn't provide a mechanism for solving the problem, then the problem cannot be solved until it does.

      free() isn't in a position to solve the problem because it's in libc. If it were actually a system call, then free() would be slower, BUT it would have the ability to solve this problem.

    70. Re:Wrong Side of Bed? by hackstraw · · Score: 1

      10 years would be 1996, when I got my first ever computer. It had a massive.... 32 MB.

      I guess the 256 meg estimate would be more accurate at 6-7 years, not 10.

      I rounded. 1997 was 9 years ago, and then I used 128-256 megs of RAM on personal and server boxes. More RAM on bigger boxes. Today, I have machines with up to 6 gigs or RAM, much more if you count distributed memory systems (I don't).

      Today, about 512 is what I call "entry level". That will get most people good performance. I'm a "power user" and I have 512 and 1 gig on my two personal machines. I overspeced the 1 gig when I bought it, but I bought that computer to run apps that I have never run before and thought that 1 gig sounded reasonable, but come to find out 512 would have been OK for me.

    71. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      On Linux, you already have a competant API.

      You'll forgive me if I don't leap right in and start using vmsplice. When vmsplice has some real world testing under it's belt- and I know it won't crash on me because of some idiotic circumstance that no one though of (and that I seem to be particularly good at finding)- then maybe I will use it. When I have some actual, real world benchmarks that I can study and that show a clear win for vmsplice- then maybe I will use it. But to use it just because you and Linus say I should? I don't think so.

      All of this aside- There was no reason to call people names and get nasty. It's bad enough when the Linux fanboys do it- when Torvalds himself does it it's just an embarassment.

      I do development on Linux and sometimes on FreeBSD. The nice thing about FreeBSD is that despite not writing FreeBSD code for a while I can be sure no one has up and changed half the libraries for no particular reason. And even though I do less work on FreeBSD I've had a far lower percentage of crashes. Although that may just be my experience it is still my experience.

      -me

    72. Re:Wrong Side of Bed? by mrsbrisby · · Score: 4, Informative
      But that's not true in general. 99% of all fork() calls are followed by exec() and the entire space gets dumped. That's why COW is a huge win in the average case. The case of an application using fork() followed by actually doing something useful is exceptionally rare outside of the server space. In fact, Apache is about the only program I can think of that ever does this.

      This isn't about fork() it's about zero copy buffers, not code and data pages in general.

      Consider a block like this:
      char buffer[4096];
      for(i = 0; i < len;) { r = read(fd, buffer, 4096); zero_write(fd2, buffer, r); i += r; }
      Now, on the whole, if zero_write() works like write() then an awful lot of copying is going on. But if zero_write() uses the buffer for kernel space as well, it's much faster (1 less copy).

      Now the trick is returning to userspace before the buffer is completely used. In FreeBSD a page fault would occur immediately during read().

      Both FreeBSD and Linux agree that you shouldn't do this. Instead something like this:
      char *buffer;
      for(i = 0; i < len;) { buffer = malloc(4096); r = read(fd, buffer, 4096); zero_write(fd2, buffer, r); i += r; }
      The trick at this point, is that elsewhere in your code, Linux can tell you when those malloc() buffers can be reused, whereas FreeBSD doesn't. It relies on the fact that you'll either make a blocking call on fd2 before you free buffer _OR_ you'll accept a page fault.

      But if you can be told when it will occur, you don't need to do either of these things, and as a result, you NEVER have to wait. This means your program will be simpler and go faster.
    73. Re:Wrong Side of Bed? by Sangui5 · · Score: 3, Informative

      Then it would operate quickly on FreeBSD. The problem then becomes exactly when do you free all those malloc()s?

      No, it'd be slower than just copying on FreeBSD too.

      while(read < totalSize){
      buffer = malloc(1024); //1024 is < pagesize!
      length = fread(buffer, 1, 1024, &file);
      read += length; //Do some stuff, but don't free the buffer!
      }

      This is where VM games really bite you in the ass, because you get false sharing. Even if you never reuse the buffer, this can cause 3 copies--each group of 4 (3.99ish) buffers will be on the same page, and therefore each call will cause a fault from the previous one.

      In theory the OS could be allow itself write & check for overlapping calls (& avoid the COW fault), but note that the read() example really isn't interesting for zero-copy unless you're using hardware TCP offloading. Zero copy is more interesting for write(). The usual case is then:

      while(){
      b = malloc();
      fill_in_buffer(b);
      write(b);
      }

      and that fill_in_buffer step *must* cause a fault if sets of buffers are on the same page. To avoid COW faults you have to be really careful that you don't accidentally write to the same page as the buffer--even indirectly by malloc updating it's inline data structures. That's pretty nasty to do--the easiest way is to allocate 8K at a time, and use a page-aligned chunk from the middle of it. Talk about a waste of memory.

    74. Re:Wrong Side of Bed? by koh · · Score: 1

      Disclaimer: I might not know what I'm talking about.

      What if the parent process forks a gazillion child processes which never write, and the the parent process writes to a memory location? Do you then have to allocate memory for all gazillion child processes at once? That could suck.

      I do not think you would do that, because it's not very logical. You would just duplicate the page used by the parent and let the gazillion children share the "original".

      How many child processes never write to memory?

      Don't know. Depends on you talking about heap memory or stack memory to begin with.

      What exactly is it doing if not writing to memory?

      You can do many things without ever writing to heap memory. Also note that we're talking about copying pages of memory on demand, so only the pages affected by a write are copied, and pages weighted 4K on x86 last time I checked. If you're only writing to stack memory then there are only a few memory pages impacted and COW should still win.

      Does Copy on Write only save you memory if you fork a bunch of child processes that do nothing?

      Yes. If all your child processes do nothing (say sleep(1000) in a tight loop), they only touch their stack (if the function call is not inlined), and only one memory page is copied. If they do nothing at all and exit immediately, exactly 0 pages are copied. If they exec() something, all the pages are dropped, but you didn't copy anything before.

      Without COW, fork() implies duplicating all memory pages of the parent process - even swapped-out heap pages that the _parent_ itself never used for days...

      Or is it a matter of saving the allocation of memory until the parent process is done, thus speeding up the parent process at the time of the fork?

      It is a matter of duplicating memory only when one client of the shared pool writes to it, and duplicating for this client only.

      Why not then copy on parent-write or child-execution, whichever comes first?

      Copy on parent-write is a subset of COW. If you can copy onwrite, you can copy on parent-write. Copy on child-exec should not happen because it does not make sense to copy the memory of the parent right before discarding it to map another process.

      So yes, COW is a good technique with fork()s. What is debated here is that Linus thinks it sucks for optimized I/O and said it on a bad day.

      --
      Karma cannot be described by words alone.
    75. Re:Wrong Side of Bed? by Bill+Hayden · · Score: 0, Offtopic

      Unless you put the malloc on the outside of the loop, or call free() at the end of the loop, you've got a serious memory leak there.

      --
      Protect your browser with the Force Safe Search add-on
    76. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0
      Why does Torvalds care? I mean, why does he care what FreeBSD is doing, or how they are doing it?


      Read the email exange again and it will become clear why he cares and why he explained what is wrong with it. (Basically because someone said, "let's do it in Linux like they do it in FreeBSD.")
    77. Re:Wrong Side of Bed? by Jherek+Carnelian · · Score: 5, Informative

      When I need to fork(), I do not have the time to think of all the memory management invovled with fork().

      This has NOTHING to do with fork(). You are used to CoW (copy-on-write for anyone else reading along) only applying to fork(), but that is not the issue under discussion at all. You, and probably 95% of the responders here, need to go RTFA.

      The issue is implementing zero-copy IO. FreeBSD's way of doing it do a setsockopt() that causes any write() on that socket to mark the buffer CoW so that it can use it exclusively for handing down to the device driver. The "magic" is that if the programmer tries to use that buffer while the device driver owns it he will get a copy. BUT, the programmer has no way of knowing when that buffer is available again.

      Linus's point is that marking a page CoW is very expensive - especially in an SMP environment, almost as expensive as just copying that page to begin with would be. He also argues that taking a page-fault to invoke the CoW to a new page, or simply to turn off the CoW attribute, is orders of magnitude more expensive than just copying it in the first place.

      So that means the CoW for sockets is only really useful if you rarely or never reuse your buffers again. And the only place that happens is in synthetic benchmarks.

      If Linus had said "Microsoft is a bunch of idiots for implementing a feature that only looks good on benchmarks" everybody would be nodding their heads in agreement. I think the reason people are not doing the same here is because they just don't understand the details.

    78. Re:Wrong Side of Bed? by quantum+bit · · Score: 1

      Except that there's no API call that detects whether or not the buffer can be reused.

      If there was, it would be non-portable and applications would then have to double their complexity to conditionally use it. With COW, zero-copy is an option you turn on once and the app doesn't have to worry about it.

      If the app is using a suitably large ring-buffer, the kernel will usually be done transmitting before it gets reused. A copy won't happen unless the NIC can't keep up, or the remote machine isn't ACKing fast enough. In that case you need to back off your transmit rate anyway, so the bottleneck isn't the extra page faults. Think of it as a self-adjusting algorithm.

      This is simply a case of the erroneous idea that the application can do a better job than the OS in deciding how to manage buffers and trying to micro-manage it.

      This is stupid. Applications that need to handle hundreads of thousands of clients simply don't block.

      Nobody is saying they have to block. The app doesn't absolutely HAVE to avoid the COW overhead at all costs. If it just uses a buffer bigger than the 2x the TCP window, 99% of the time no copy is necessary, without a big mess of code complexity.

      But being as how there isn't such a call, why CoW the kernel buffer anyway?

      To catch the 1% of the time that it needs to be copied, without imposing a huge burden on application writers.

      Updating those page tables is slow, and doesn't buy very much- except that it makes the naive approach get written in the first place.

      Anyone who wants to write a naive application is free to not enable zero-copy at all, and simply use write(). The normal overhead of copying buffers isn't terribly bad on modern hardware -- only extremely high-performance apps need zero-copy in the first place.

    79. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      You had a 386 with 20MB of RAM? Wow, the first 486 my family owned had only 4 until I maxxed it out with another 4 to play Doom 2: Hell on Earth. Then again, we did have a Packard Bell afterall...

      Still, it's pretty ridiculous what won't run (or struggles to) on a GHz+ processor and 256MB these days. I've got a Tablet PC (40X processor and 32X the memory), and Pentium 4 (96X the processor, 320X the memory, a graphics card 12X as fast by itself, and a peripheral bus that obliterates an 8-bit ISA daughterboard) to compare to. The tablet, with XP, crawls practically from bootup to shutdown and the P4 with 2000 was getting a little slow until I put that extra GB of RAM in. These are both machines without malware (except what MS sell of course) too!

      And as a totally left field case, I've got an old 200mhz Pentium Pro running 98SE that captures video/audio far and away better than it's big brother with ATI's 9-generations-removed equipment.

    80. Re:Wrong Side of Bed? by quantum+bit · · Score: 1

      Only by causing a blocking call can you make sure that before the thread can run again, that the buffer is no longer used by the kernel.

      This is stupid. Applications that need to handle hundreads of thousands of clients simply don't block.


      I completely missed the irony of this comment the first time around... The vmsplice() approach that Linus is talking about is exactly that -- a call that will block until the kernel is done with the previous buffer.

    81. Re:Wrong Side of Bed? by Sangui5 · · Score: 1

      Unfortunately, CPU cycles is not the end-all-be all of performance.

      For details, see my old comment on the topic. The short version is that the "fast" JVM allocaters return cold memory (the coldest, in fact) which in all likelyhood is NOT in cache. So you pay about 300 instructions worth of time waiting for a cache fill. If you're under memory pressure, you may miss main memory entirely with the JVM allocator--that'll cost you, oh, 24 million clock cycles waiting for the disk to bring in your page.

      malloc() has to think harder about what memory to give you back, but it gives you hot memory, which is probably even still in L1, and nearly guaranteed to be in main memory still.

      This sort of thing hits the heart of why Linus is ranting--it is important to consider all aspects of what is going on in order to tune performance. In this case, the BSD implementation does zero-copy for a buffer, but in many cases will end up copying whole pages and incurring two page faults--worse than just copying the buffer in the first place. That is, unless you jump through hoops (tiny ones which are on fire) to avoid the COW. If you're willing to do that, you may as well use the explicit interface the Linus proposes--it is simpler that avoiding the COW, even though it is more complex than using BSD & not worrying about COW overhead.

      Now, GC is great and all for saving programmer effort, but it isn't that great for performance. I can put a medium amount of effort into using explicit heap management, and get fast results. I can put minimal effort into GC and get poor performance. Or I can put a lot more effort into playing games with the "smart" GC, and get slightly slower results. If you need the performance, you'd just better go with manual management in the first place.

    82. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      It is a shame his word is worth so much.

      Thanks to the stubborness and fundamentalism of just couple core Linux developers
      - Linux still sucks with hardware because binary drivers can't work (constantly changing kernel makes them hard to maintain -> most of the vendors do not bother) - it will keep sucking.
      - LSM is "perfect" and has still the same security vulnerabilities it has had from the day 0.
      - Obscurity vs security in kernel vulnerability handling. (Linus especially is NOTORIOUS in it.)
      - The licensing (decision to use (L)GPL only) is so fundamentalistic it drives away potential supporters causing more harm than good on the long run.
      - Linux drivers and other kernel "modules" get all the toys they basically want. Bad drivers are very able to crash the whole kernel. Some drivers being very poor quality (Pilot for instance, just re-plug your device couple times in a row FAST and you've got yourself an insane kernel driver that can take it all down), this causes serious problems.
      - No security module is allowed to cause performance hit or affect anything else - impossible situation for life saving patches like PAX. (Although in PAX/Execshield case the problem is decreasing fast due development of hardware and the kernel.. It's just one example.)
      - Performance and tweaks over stability everywhere. Trial and error. The FreeBSD order of doing things seems just more reliable. Sorry. They still manage to bring in new features. ... You could make a longer list too. The bottom line is. He made the first version. He is a great hacker I believe, also pretty smart. But he is on the other side a fundamentalistic jerk who doesn't stop to think things over for a second. If I was to be in charge of Linux development, I'd fire him. Sorry.

    83. Re:Wrong Side of Bed? by Sangui5 · · Score: 1

      From the article:
      make the vmsplice call block or just return -EAGAIN if we are not ready yet

      Returning -EAGAIN would be not-blocking.

    84. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      I think you're on to something. Is there any way we can convice some of the more, er, fanatical members of the
      Linux, excuse me, GNU/Linux community to strap some IEDs to their body and use them?

    85. Re:Wrong Side of Bed? by Heembo · · Score: 1

      > Is it really feasible to expect program developers to do manual memory management

      It's funny to read this as a long-time Java developer. Sure, I'm not writing Kernel code, frankly, quite the opposite. I'm so far up the virtual stack I have no clue what Operating System I'm working on at times. Now even though Java still has memory leaks, and you can still do bad memory things with Java, (and the moral of this post is) for the most part I do not even think about memory management anymore. The Java "Garbage Collection" way of doing things seems to work very well in production high-load enviornments.

      --
      Horns are really just a broken halo.
    86. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      And yet I'v enever used a ingle Java application that wasn't unbearably slow. Java fans always have some excuse, and just continue to crank out unbearable slow apps. It's not just slow: it's Java slow.

    87. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      "COW produces a non-zero performance hit"

      sounds like double speak...

    88. Re:Wrong Side of Bed? by petermgreen · · Score: 1

      on *nix it at least was standard practice to run something by using fork and then exec. i don't know if there are newer systems now or not.

      In this environment cow made a lot of sense, the most likely thing to happen after a fork was that one of the processes would be replaced very soon.

      also btw its every write to a new page that triggers a copy not every single write

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    89. Re:Wrong Side of Bed? by Bill+Dog · · Score: 1

      Which language boasts faster raw allocation performance, the Java language, or C/C++?

      You might want to think about this: Modern JVM's are written in C/C++. That is, any memory allocater developed for the Java language could be used in C/C++. Fortunately, C/C++ supports objects as automatic variables, so I only use new or malloc for around 5-10% of my needs. I suspect it's similar for most C/C++ developers and their typical tasks. If for some reason I found myself having to do mostly just heap allocations, and tons and tons of them, then I'd look for a kick-ass new replacement to drop in.

      --
      Attention zealots and haters: 00100 00100
    90. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      > For details, see my old comment on the topic. The short version is that the "fast" JVM allocaters return cold memory (the coldest, in fact) which in all likelyhood is NOT in cache.

      Horseshit. The nursery generation is being read and written constantly, and typically lives its whole life in cache.

    91. Re:Wrong Side of Bed? by quantum+bit · · Score: 1

      Looks like the interface has changed already.

      Look at Linus's post yesterday and you'll see that in his idea of how it would be used he specifically mentioned blocking.

    92. Re:Wrong Side of Bed? by Jherek+Carnelian · · Score: 1

      If it's zero copy, why would there be copy on write? That's a non-sequitur.

      Roflcopter!

      Now don't you feel dumb for writing that huge post about something totally unrelated to the fine article?

      But you did get +5 for it, I guess all those mods didn't read it either.

    93. Re:Wrong Side of Bed? by outZider · · Score: 5, Insightful

      Here's my -1, Troll.

      Funny that we just had an article about how many Linux users and enthusiasts exclude other people by being complete dicks, and here you are, acting like a dick. Of course, I don't know you from Joe Blow, so maybe I just misunderstood your obviously angsty response.

      "That's obvious."
      "Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD."

      Linux acts exactly how I'd expect, too. It completely sucks when it comes to memory and process management. Linux may have a better threading kernel, but that's the only thing that seems to save it in the real world. After only six years of administering servers professionally and for my own use, it has come down to Linux on the desktop, and FreeBSD for Real Work(tm). Many large companies that depend on their data agree with me, and those who use Linux or Windows just throw more machines at the problem.

      At least Linux is free compared to Windows, right?

      --
      - oZ
      // i am here.
    94. Re:Wrong Side of Bed? by thallgren · · Score: 2, Insightful

      > it has proved to be the right decision in the long-term.

      How can you say something like this? If Linux had a debugger from the start, it could be ripped out right now if there was some gain by doing that. By not having it, you only induced developers lots of pain during the last 10-15 years, for those occasions where a debugger really are the right tool for the job.

      And yeah, I know some of Linus' theories about how to program, how he thinks asserts and invariants are bad things, I just don't agree with him.

    95. Re:Wrong Side of Bed? by Ruie · · Score: 1
      Ok, I don't know for sure about 2.6.x, but I am certain that Linux 2.4.x used COW for fork.

      What the discussion is about is a method to do zero-copy I/O - i.e. share buffers between userland and kernel.

      Apparently, one could try to make a hack where the buffer is marked COW, so that when userland starts to write into it before the I/O finishes it is copied.

      The alternative (which I find quite sensible) is to simply communicate to the application when the kernel is done with the buffer.

    96. Re:Wrong Side of Bed? by Rich0 · · Score: 2, Funny

      Only if you assume you're using linux to power a web

    97. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      It's been a while since I've used straight C, I'll admit, however, why not do this via a ref count like in COM or .NET (I know, wrong crowd), and I would assume Java???? It seems to me that the compiler could implement the code to free memory malloc'd and forgotten/no longer in scope/having no reference memory.

    98. Re:Wrong Side of Bed? by Alioth · · Score: 1

      Also, if the application programmer has to deal with it, it hurts portability. I maintain a Linux port of a multiplatform game. I really don't want to add yet more Linux-specific code because it's the only OS where you have to do all this extra stuff to get a decent memory footprint.

    99. Re:Wrong Side of Bed? by Rich0 · · Score: 1

      Only if you're assuming that your linux is powering a $10k webserver for some major operation, and you went ahead and spent $500 on RAM.

      If, on the other hand, it is running on a $500 desktop PC with 1GB of RAM, then copying a 500M application "in RAM" doesn't go nearly as fast, when you have 400M of other stuff in memory at the same time.

      Sure, RAM is cheap, but it isn't free. And COW works fine in many cases - especially if RAM is in more demand than CPU.

    100. Re:Wrong Side of Bed? by katsklaw · · Score: 1

      I agree completely. FreeBSD is *NOT* Linux. Therefore he should loose nothing if FreeBSD/OSX/Windows or any other non-Linux OS fails in anyway.

    101. Re:Wrong Side of Bed? by Erich · · Score: 1
      In a nutshell:

      Copy-on-write for fork(): GOOD

      Copy-on-write for playing tricks with buffers during read() and write() type operations: (arguably) BAD

      If you'd read TFA, you'd see that the issue is the API for doing zero-copy data movement (like sendfile()). You can give the buffer to the kernel, and then you shouldn't mess with the buffer until the kernel is done (this is Linus's preference). OR, you can re-mark the page with the buffer in it as read-only, and then if the user DOES modify something in the page with the buffer before the kernel is finished with it, the kernel can copy the page, mark it writable, and restart the process. Once the kernel is finished with the page, the page may be freed.

      TLB shootdowns and refills are expensive. So if you do Linus's preference, you avoid that penalty. The cost is that you have to remember to ask the kernel to notify you when it is finished. Or you can do the BSD thing, which looks nicer from a programming standpoint, but the cost of the implementation is higher.

      The COW situation is exacerbated with multiple page sizes... if you are trying to get better utilization from your TLB by using large (ie, 4M) pages, modifying something in the page may be very likely, especially in a multithreaded application where most of the page may hold information completely unrelated to the buffer.

      --

      -- Erich

      Slashdot reader since 1997

    102. Re:Wrong Side of Bed? by Arandir · · Score: 1

      The memory management stuff that Java, Ruby and Python people are always having cows about is NOT the same memory management stuff that Linus is talking about.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    103. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      ...just felt like writing something:

      perl -ne'fork() while 1;'

    104. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0
      Thank you very much for either not reading or not understanding what you are replying to.

      Better luck next time.

    105. Re:Wrong Side of Bed? by fimbulvetr · · Score: 1

      Linus is really just sacrificing the immediate good for what would inevitably become the default, which is bad programming. It's no different than not allowing triggers in SQL. Sure, sometimes they're nice, but more often than not they are used because the programmer is lazy and didn't really normalize their database. I, for one, congratulate him on not giving into the pressure. If short term fixes were implemented more often in the kernel, I suspect it'd be much worse off in stability, performance and scalability than it is now. Since linux kernel programmers are forced to do things "The Hard Way" in the kernel, it all but guarantees that they have an intimate knowledge of the systems and workings.

      By not having it, you only induced developers lots of pain during the last 10-15 years, for those occasions where a debugger really are the right tool for the job.

      If the debugger would have been the right thing for the job, why not just patch it in? There are several different versions of debbuging patches, and they developer should be the only one who worries about debugging. You shouldn't force everyone who uses the Linux kernel to download and maybe even unselect kernel debugging.

    106. Re:Wrong Side of Bed? by Arandir · · Score: 1

      Most people have a sense of humor.

      Calling someone an "incompetent idiot" is not humor. The FreeBSD and Mach developers aren't laughing because Torvalds didn't say anything funny.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    107. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0
      Thank you very much for either not reading or not understanding what you are replying to.

      Better luck next time, though.

    108. Re:Wrong Side of Bed? by Arandir · · Score: 1

      I don't know for sure about 2.6.x, but I am certain that Linux 2.4.x used COW for fork.

      According to Torvalds, that means linux-2.4.x must have been written by "incompetent idiots."

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    109. Re:Wrong Side of Bed? by sustik · · Score: 1

      Interesting. So would the use of alloca() make a difference?

      Matyas

    110. Re:Wrong Side of Bed? by jbolden · · Score: 1

      In 1990 you were paying too much for a 486 (Intel charged a ton for the coprocessor). The 386-40 with a 3rd party coprocessor was under half the cost and almost as good as a 486-25. A year or two later (which is when I assume you bought your 486) there wasn't quite the premium. As for Ram nothing brings up performance like RAM. As for Packard Bell I had 8 slots, originally it was 8x1 then I went to 4x4+4x1.

    111. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Hmmm...I think you misspelled 'crusade'.

    112. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      They're talking about COW for write(), not fork(). Your entire rant is completely misplaced.

    113. Re:Wrong Side of Bed? by Just+Some+Guy · · Score: 4, Insightful
      No BSDer has been willing to reproduce the tests, as it will only confirm what the marketplace has already decided ... Linux is the superior OS.

      ...but still inferior to Windows, right? I mean, we're only looking at number of installations, after all. Furthermore, McDonald's clearly has the best hamburger and Velveeta beats Hoffman's Super Sharp.

      I like Linux. I'm typing this on a Gentoo box. However, I'd never pretend that it's better in every single aspect than any other OS in existence. The BSD guys have a few tricks up their sleeve, and even Redmond manages to get things right on rare occasion.

      --
      Dewey, what part of this looks like authorities should be involved?
    114. Re:Wrong Side of Bed? by petermgreen · · Score: 1

      ok that may indeed be the case. malloc is a very general api and as such using it directly for performance sensitive code is stupid.

      with java you have NO REAL CHOICE but to allocate every object (and by extention any records you wan't since java doesn't have a seperate record type) directly from javas heap.

      tell me which will take longer to allocate. an array of 10000 stuctures in a traditional language where the stuctures sit in the array or an array of 10000 objects in java where the objects are only referenced by the array?

      ofc there are direct bytebuffers but those have two issues,
      1: horrible api to work with compared with a language provided array of structures.
      2: if any code in the program decides to load the class for an indirect bytebuffer your performance will plumet (this is apparently something they plan to fix in mustang, time will tell).

      simalarlly if i wan't to return multiple values from a function in a traditional language i can do so either by reference parameters or by using a structure as the result type. If i wan't to do the same in java then i have to use an object (depending on how the codes structured it may be possible to reuse that object but that brings complexity of its own).

      and sun designed jni to give them maximum flexibility in vm implementation not maximum performance of calls to native code. This is understandable but it means that if you are moving work to native code for perfomance reasons it has to be done in big chunks.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    115. Re:Wrong Side of Bed? by Curtman · · Score: 0, Flamebait

      You forgot:

      Floppy discs are not an acceptable storage medium.

    116. Re:Wrong Side of Bed? by katsklaw · · Score: 1

      If Linus sees something that he thinks is bad, coming from FreeBSD and into Linux .. he can always say "NO!" .. There is no need to call anyone names or even mention beyond a simple reference any product name.

      There is a HUGE difference between the phrases "I claim that Mach people (and apparently FreeBSD) are incompetent idiots." and "don't use that crap in the linux kernel!"

      Advocation is not the same as obligation.

    117. Re:Wrong Side of Bed? by Gorshkov · · Score: 1

      I don't think it's a tricky issue at all - I tend to side with Linus on this one.

      As far as the insults are concerned ...... who amongst us techies has not at somepoint said something like "Anybody who is still using plain b-trees when rbl trees are so OBVIOUSLY superiour SHOULD BE SHOT ON SIGHT?"

      I'ts a hot technical debate, and I seriously doubt that any *BSD developer who *isn't* an idiot would be too badly insulted by what Linus said.

      Now - if the technical discussion devolves into just simple name-calling, then they're *all* idiots.

    118. Re:Wrong Side of Bed? by pammon · · Score: 1

      man aio_return

    119. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      Except vmsplice() works on a fd, and one can do a select() or poll() on it.

    120. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      If it just uses a buffer bigger than the 2x the TCP window, 99% of the time no copy is necessary, without a big mess of code complexity.

      This simply isn't true. This zero-copy chunk is designed to run as part of a big select-loop. Allowing one of those fd's in the select loop to trigger the frees() means the code complexity is down- one doesn't need a ring buffer or anything, they simply malloc as needed, and when notifications come in, free the pointed-to chunk.

      Anyone who wants to write a naive application is free to not enable zero-copy at all, and simply use write(). The normal overhead of copying buffers isn't terribly bad on modern hardware -- only extremely high-performance apps need zero-copy in the first place.

      Nobody said naive application. I said naive implementation. A 2x buffer is a naive implementation- it doesn't catch the important cases and severely limits the performance of these extremely high-performance apps that this infrastructure addresses.

    121. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      It's been a while since I've used straight C, I'll admit, however, why not do this via a ref count like in COM or .NET (I know, wrong crowd), and I would assume Java???? It seems to me that the compiler could implement the code to free memory malloc'd and forgotten/no longer in scope/having no reference memory.

      Except that the compiler doesn't have access to this information. Userspace doesn't have access to it at all- that's the problem with FreeBSD that's being pointed out.

      Linux DOES have such a mechanism so it WOULD be possible to maintain user-space reference counts.

    122. Re:Wrong Side of Bed? by fredrik70 · · Score: 1

      >"don't put that crap into the Linux kernel."

      um, COW is a very proven concept. maybe there are better ways, and yeah, of course we should try to find better ways, but COW is still a solid idea. and that's not only my opinion.

      --
      if (!signature) { throw std::runtime_error("No sig!"); }
    123. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      In theory the OS could be allow itself write & check for overlapping calls (& avoid the COW fault), but note that the read() example really isn't interesting for zero-copy unless you're using hardware TCP offloading. Zero copy is more interesting for write(). The usual case is then:

      TCP ``offloading'' is exactly what's going on here. The above was a demonstration of part of a sendfile loop. The idea is that it would really be triggered by a select() or poll() notification.

      To avoid COW faults you have to be really careful that you don't accidentally write to the same page as the buffer--even indirectly by malloc updating it's inline data structures. That's pretty nasty to do--the easiest way is to allocate 8K at a time, and use a page-aligned chunk from the middle of it. Talk about a waste of memory.

      That's the point. How does one avoid COW faults on FreeBSD? You can't tell whether or not the kernel is done "offloading" your data yet.

      The manual for FreeBSD's zero_copy mechanism suggests using a buffer 2x the size of the TCP buffer size. It's wrong, it'll still fault, and performance will still suffer- although not as much as a fault every load of the buffer.

    124. Re:Wrong Side of Bed? by Lehk228 · · Score: 1

      what the hell application is going to fork with 500 megs of data?

      --
      Snowden and Manning are heroes.
    125. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      I know this is almost pseudocode, but this is a memory leak. Especially if you don't ever free the buffer.

      That's the point. You have to keep track of those pages as allocated and free them later- after the operating system has had an opportunity to send the data that's in them.

    126. Re:Wrong Side of Bed? by statusbar · · Score: 1

      But that does not apply in this case where freebsd uses COW for a normal blocking write() call...

      jeff

      --
      ipv6 is my vpn
    127. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      Interesting. So would the use of alloca() make a difference?

      No. alloca() allocates memory from the stack. The above is still hiding some details- malloc() isn't really appropriate- but some device that allocates an actual page-sized chunk is what is required.

    128. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      Unless you put the malloc on the outside of the loop, or call free() at the end of the loop, you've got a serious memory leak there.

      Please re-read this thread. You are not following it, and what you're bringing up is redundant.

      This fragment is designed to be called from a big select() or poll() loop. It wasn't designed to appear in isolation.

      So yes, of course you have to free() the memory- the question is _when_. If you free it too soon, FreeBSD does a page fault and an extra copy thus taking longer than just using read() and write(). If you free() it too late, you waste memory.

      Linux allows userspace to receive notification of when free() needs to take place.

    129. Re:Wrong Side of Bed? by Gorshkov · · Score: 1

      If the app is using a suitably large ring-buffer, the kernel will usually be done transmitting before it gets reused. A copy won't happen unless the NIC can't keep up, or the remote machine isn't ACKing fast enough. In that case you need to back off your transmit rate anyway, so the bottleneck isn't the extra page faults. Think of it as a self-adjusting algorithm.

      I want to know what the hell kind of network hardware & link YOU have, that it can keep up with the speed of memory writes inside of an applicaton.

      "Suitably large" is a crock, in any case. Simple math. If I can write to buffers faster than you can send it (and I guarentee that I can), then all a larger buffer will do is delay the inevitable.

    130. Re:Wrong Side of Bed? by cant_get_a_good_nick · · Score: 1

      Linus doesn't like "VM Games" despite the fact that Virtual Memory, Memory Mapped Files, Disk I/O, Write Caching, etc, etc, etc, are all already "Memory Games" and "VM Games"
      Anyone else remember the entire VM being ripped out of 2.4 "production" kernel and replaced?

    131. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 1, Insightful

      That's exactly the problem with Linus' rant. He lambastes COW in general which is silly. zero_copy(9) is sufficently scarey that no sane programmer is going to say to himself "Oh, I'll add zero copy support and hope it makes things better".

    132. Re:Wrong Side of Bed? by gnud · · Score: 1

      But they should download and maybe even unselect amateur radio?

    133. Re:Wrong Side of Bed? by mrawl · · Score: 1

      If your argument boils down to fork/exec why not advocate create_process() instead?

    134. Re:Wrong Side of Bed? by aevans · · Score: 1

      Let the language do it for the programmers, or the memory management library, or the application framework, or the VM, not the operating system. Why take away the option for the programmer to control to memory? A program using a framework that utilizes a data caching library running in an application server running on top of a virtual machine running on top of the OS is today's reality. Let's not build application level controls into the the hardware controls hoping that someday the heuristics for the hardware controls will be smart enough to be fast enough, even if they're never quite as fast as doing it explicitly.

    135. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Linus is worried about the threat of iMac on Intel platform that will see more people migrate to OS X/FreeBSD. That's why

    136. Re:Wrong Side of Bed? by Knuckles · · Score: 1

      256 MB is a perfectly reasonable test for anything except a videogame system.

      Corporate laptops in our company regularly run this stuff concurrently:
      WinXP
      MS Word, Excel, PowerPoint 2003(all with potentially huge files that can use 200 MB RAM or more). Project and Access too you not that regularly.
      Lotus Notes
      MS Groove
      SpySweeper
      Symantec Antivirus
      Blackberry Desktop Manager
      IE or Firefox
      Altiris client, and assorted other services
      Misc user software

      You are fucked under a gig, and even that is not comfortable

      --
      "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
    137. Re:Wrong Side of Bed? by aevans · · Score: 1

      Which is a better way to design a VM? Try forever to implement a pretty design that appeals to your aesthetic sensibilities, eventually hoping to iron all the inconsistencies and gaps in the design. Or try a hundred different models, refining as you go, discarding flaws, and gaining real world experience on a variety of models?

    138. Re:Wrong Side of Bed? by larry+bagina · · Score: 2, Insightful
      I think the point is should the programmer be forced to allways have COW enabled or be able to choose.

      They already can choose. Kernel threads (via clone(2)) allow you to specify what (memory, files, signal handlers, etc) is cloned.

      Why fork? Because you're going to exec*(2) another program. Otherwise, you'd usually be better off using a thread.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    139. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Summary of parent post: "I'm an asshole."

    140. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Yeah, the FreeBSD and Mach developers agree with you.

    141. Re:Wrong Side of Bed? by AuMatar · · Score: 1, Informative

      In the last 12 months I've had corporate desktops with 256 MB. It wasn't a problem.

      Of course, being a dev I was doing all my real work on the Linux workstation next to it. :) But seriously, look at what HP, Dell, Best Buy, etc are selling these days. 256 MB is still EXTREMELY common. Hell, my GAMING machine is running some fairly new games (DDO, Civ 4) just fine on 512 MB (the motherboard is dieing and the second channel isn't working). A gig is still an insane amount for anyone other than a gamer.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    142. Re:Wrong Side of Bed? by funpet · · Score: 0

      The programmer may not know better than the system about memory management, but the compiler probably does, and it's the compiler that is really doing the bulk of memory management in almost every case nowadays, especially with files.

    143. Re:Wrong Side of Bed? by janeil · · Score: 1
      Why does he care? Easy. He's a programmer, he's thought about the problem, it matters. I don't think he thinks of himself as "Linus, creator of Linux."

      A better question is, why is the above post modded up to 5, funny? Was a paragraph deleted or something? Where's the funny part?

    144. Re:Wrong Side of Bed? by pthisis · · Score: 2, Interesting
      The vmsplice() approach that Linus is talking about is exactly that -- a call that will block until the kernel is done with the previous buffer.

      That's certainly not the impression I get from Dave Miller's commentary about splice/tee to sockets, which discusses using poll/select/more advanced methods to see when the splice has finished and comments:


      We really can't block on this, but I guess we could consider allowing
      that for really dumb applications.

      It does indeed require some smarts in the application to field the
      events, but by definition of using this splice stuff there is explicit
      knowledge in the application of what's going on.

      This is why I'm very hesitant to say "yeah, blocking on the socket is
      OK", because to be honest it's not. As long as the socket buffer
      limits haven't been reached, we really shouldn't block so the user can
      go and do more work and create more transmit data in time to keep the
      network pipe full.


      Or Linus commenting:

      Some users may even be able to take _advantage_ of the fact that the
      buffer is "in flight" _and_ mapped into user space after it has been
      submitted. You could imagine code that actually goes on modifying the
      buffer even while it's being queued for sending. Under some strange
      circumstances that may actually be useful
      --
      rage, rage against the dying of the light
    145. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Calling someone an "incompetent idiot" is not humor. The FreeBSD and Mach developers aren't laughing because Torvalds didn't say anything funny.

      I thought it was funny, and moreover, correct.

    146. Re:Wrong Side of Bed? by pthisis · · Score: 1

      Indeed, looking at the vmsplice patch you can clearly see that it supports nonblocking I/O (with the SPLICE_F_NBLOCK flag)
      http://www.ussg.iu.edu/hypermail/linux/kernel/0604 .2/1399.html

      --
      rage, rage against the dying of the light
    147. Re:Wrong Side of Bed? by Knuckles · · Score: 1

      A gig is still an insane amount for anyone other than a gamer

      Well, I just told you that it's not an insane amount for the people in my company :) (more than 10,000 users with these requirements, by the way)

      --
      "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
    148. Re:Wrong Side of Bed? by Breakfast+Pants · · Score: 1

      I don't know if this was what he was trying to say or not, but basically, the writers of the Java VM could manually use the vmsplice() routines etc. and then all java applications would automatically benefit. The question is whether it is general enough to be handled appropriately in the kernel (there are many different usage patterns), or whether in the average case it will be a net loss. If so, provide this system call for low to no overhead manual management at the application level (a level at which a java, et al application doesn't actually run).

      --

      --

      WHO ATE MY BREAKFAST PANTS?
    149. Re:Wrong Side of Bed? by fimbulvetr · · Score: 1

      The amateur radio option supports certain hardware, while we could argue why they should or shouldn't include it until the sun goes down, the simple fact remains that that isn't an apples to apples comparison. Native debugging options, at least in this case, foster a "brute force it until it compiles and works" mentality. (And I'm not going to argue that nothing else fosters that mentality)

      Simply writing a complex program and debugging it until it works is not encouraged. You should have a good idea of how to write the code, and you should be 90% working without even attempting the first compile. Most of the rest can be solved with printks in your module, and the remainder are/might be nuances of your hardware/coding style/kernel version. If you understand the architecture intimately and your need for a debugger is minimal, chances are your code will work in multiple versions of the kernel - which is what the Linux community wants. We don't want to contact you, the maintainer every version release when your code breaks just because you don't fully understand how the kernel model works, and only got your code working as a result of a debugger.

      By debugging, and not to make a blanket statement here, generally you are saying "I don't fully understand why X is doing Y", or "Why FOO has a lock on BAR.", etc.

    150. Re:Wrong Side of Bed? by segedunum · · Score: 1

      Linux acts exactly how I'd expect, too. It completely sucks when it comes to memory and process management.

      Sorry, but if you want people to stop 'being a dick' towards you you're going to have to provide an awful lot more information as opposed to the usual forum crap of "Linux uses a lot of memory, it goes slow...." etc. etc.

      it has come down to Linux on the desktop, and FreeBSD for Real Work(tm). Many large companies that depend on their data agree with me, and those who use Linux or Windows just throw more machines at the problem.

      Well, people can quite easily say that that isn't true (and it isn't) with just as little evidence as you've presented. This is just a "Real companies depend on FreeBSD!" rant. Thanks for clearing that one up.

    151. Re:Wrong Side of Bed? by Predius · · Score: 1

      I'd be happy to run the tests, but I'd need support from the linux community as I'm not experienced enough with it to be comfortable claiming x.y.z is a reasonable setup environment.

    152. Re:Wrong Side of Bed? by Bloater · · Score: 1

      I just upgraded my parents PC from 64 MB to 256 MB. It runs like a dream with Windows XP on it. It would be quite comfortable with 128MB. I think a standard desktop operating system should be able to do word processing and web browsing, at the same time, with 32MB. How much crap do you need to store resident in core to do a few desktopish things? Really?

    153. Re:Wrong Side of Bed? by pi_rules · · Score: 1
      You pretty much have it right. I've generally disagreed with Linus about architectural issues. That's why I don't run Linux much these days....
      Is that you Andrew S. Tanenbaum?
    154. Re:Wrong Side of Bed? by Bing+Tsher+E · · Score: 1

      256 has not really been a standard entry level amount of RAM for almost 10 years now.

      Ten years ago 256 MB of RAM was an unusual amount of memory to have in a PC. Typical users had 32 or 64MB. The typical motherboard of 1996 was a Pentium I motherboard that would have four slots for 72pin SIMMs. The really expensive SIMMs were 32M. Nothing bigger than 32M SIMMs were supported in most of said motherboards. So people with lots of money had 128MB and most everybody else had 32-64.

      Maybe you weren't around yet. But, then, why are you talking about something you don't know?

    155. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      37OHSSV O773H

      Tip on how to read this is you are not in mrsbrisby's position: turn the screen upside-down

    156. Re:Wrong Side of Bed? by harmic · · Score: 1

      Actually, wouldn't the best way to make use of this be for the C library to worry about when the buffer is finished with, making the whole thing transparent to the application programmer.

      Using the example given by others below:

      char *buffer;

      while(more_data_to_write)
      {
      buffer = malloc(1024);
      fill_buffer(buffer)
      write(buffer);
      free(buffer);
      }

      As I understand it, the problem is that once you have returned from write, the kernel may not yet have written the buffer to disk, merely queued it for writing. If you used the same buffer again, you would immediately have a page fault, then have to copy the page. Even if you use the code sample above and allocate a new buffer each time round the loop, libc may reallocate the same memory (since you free it at the end of the loop). The notification Linus is talking about is primarily aimed at telling the application when it can free() the buffer, as the kernel is now finished with it.

      Imagine libc doing this:

      1. When getting a write() call that uses a malloc'd buffer - flags the buffer as being used by the kernel
      2. When getting a free() for a buffer that is flagged as being used by the kernel, just flags it as being free from a user perspective
      3. When getting a notification from the kernel that it is done with the buffer, finally marking the buffer as completely free and available for reallocation

      This way, the same code would work regardless of platform, just perform better on platforms that use this kind of notification. Of course the libc implementation would also need to take care of keeping such buffers in separate pages. Also there would be a lot of checking required so that libc new wether a buffer supplied to write() was actually malloced rather than a static buffer. well I did not say any of this would be easy!

    157. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      > ...but still inferior to Windows, right?

      Windows is for most people a vastly superior system than Linux, because it is faster, has more useful applications and is easier to use.

      You may now moderate me down, but it's the truth.

    158. Re:Wrong Side of Bed? by outZider · · Score: 1

      It was more a commentary on the parent post, where the piece of evidence he provided was damn close to nil, so it was all personal experience. There was my personal experience. And you did the same thing with "(and it isn't)" Welcome to a pointless, stupid flame war. Use what you want and what works best for you. My comments was that the parent poster was being a dick, not that he was wrong. He wasn't being a dick toward me, I wasn't the parent above him. I was calling him out for being one.

      For what it's worth, no one said Linux used too much memory, or that it goes slow. I agreed with the top poster than Linux does not handle low memory situations well in the 2.6 kernel. A quick Google search finds quite a few people who think the same thing. This isn't a scientific look, though, so feel free to give me some kind of research that Linux is the second coming of Christ.

      --
      - oZ
      // i am here.
    159. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      I thought it was funny, and moreover, correct.

      That's because you're an anonymous coward who still thinks boogers are funny.

      p.s. And Linus still wonders why he got so many wedgies and swirlies in grade school...

    160. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Wrong COW dude.

      COW isn't only used for forking, but rather in any place when you think you can get away without a copy.

      What Linus is saying is that there are 3 aproaches, with differing performance results:

      (1) Copy always: When you call write, the kernel makes a copy of your data and returns. The kernel uses its copy to send to the network card and you can fill your buffer with more data. This is portable and has been optimized to hell and back.
      (2) COW: When you call write the kernel lies to you and pretends it made its own copy. If you don't write to the page until after the kernel is done with it, all is good, only a short page fault and some TLB fiddling (and interrupting avery processor in the system, etc). If you do write to your buffer, the kernel takes the fault and does make a copy.
      (3) Handoff and notification: The client doesn't call write, but a special call to hand over its buffer to the kernel and net card, and promises not to write to it. The kernel notifies the client when the net card is done with the buffer, but in the meantime the app can keep working on other buffers.

      For most applications, (1) is good enough and so portable it's not even funny. For high performance the best solution is (3). His point is that there's no good reason to use (2), and that in fact it'll be slower than (1) on real-world usage.

      I can see his point even if he doesn't show numbers to back it up. Coding for (3) doesn't seem so hard after all, though they followups do make sense when they talk about having to wait not for the data to finish sending, but rather have TCP release the pages to the app when the ACK comes back, so using (3) requires a a bit more work than just using COW (COW also needs this info, but it's more transparent to the app).

    161. Re:Wrong Side of Bed? by Rich0 · · Score: 1

      Ok, I get burned by accesskeys in konqueror, and somebody mods it funny?

      Ok, this is offtopic, but does ANYBODY know how to disable accesskeys in konqueror? I bump the ctrl key and then the next button I hit sends me off to who knows where...

    162. Re:Wrong Side of Bed? by Nato_Uno · · Score: 1

      I think if you read the thread you'll find that they're not discussing COW in general, the issue is specifically with COW for network sockets, which *doesn't* seem to be a particularly proven concept. Even the FreeBSD implementation of COW for network sockets comes with big warnings...

      --

      Have fun,

      Nathan 'Nato' Uno
      http://web.unos.net/
    163. Re:Wrong Side of Bed? by flyingindian · · Score: 1

      I think a lot of the readers, including the parent poster, on Slashdot (not surprisingly) missed the context in which Linux made his comments.

      He is not saying using COW is bad for the fork() system call or such stuff. He is talking about transferring data between the kernel and user space for things like sockets. In this case COW is used to prevent the user-space code from writing to the buffer while the kernel is still using the buffer but the user-space code will eventually write to the buffer. Depending on the scheduling of the user code and the kernel code, the user-space code could attempt its write while the kernel is still writing and require a copy of the page.
        In this case avoiding concurrent access to the buffer is the goal. It can be done either with COW or by simply copying the buffer in the beginning. In the COW implementation if the page ends up being copied, the whole COW trick is pure overhead. In addition with SMP systems cache coherency issues can come into play making a COW based solution to this problem even less attractive.

      And the whole thing about Linus dissing FreeBSD/Mach developers is one flippant statement in a longish technical discussion - but putting it in a Slashdot headline definitely makes for better press.

    164. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Same thing "Einstein"

    165. Re:Wrong Side of Bed? by putaro · · Score: 1

      Consider a block like this, though:

      char * data = mmap(0, fileLen, PROT_READ, MAP_FILE, fd, 0);

      zero_write(fd2, data, fileLen);

      No kernel/user space transitions, no buffer reuse, no malloc/frees (and no way to stop it in the middle of the xfer either :-)).
      I would wager that *this* is what zero_write is designed for. The code is considerably simpler than trying to manage buffers and get acks back when the buffer is ready to be reused, etc. Most apps don't really generate large amounts of data to be sent over the wire - they're getting it out of files that already exist. The one exception is when you're doing something like SSL.

    166. Re:Wrong Side of Bed? by pammon · · Score: 1
      But that does not apply in this case where freebsd uses COW for a normal blocking write() call...

      Right, but neither does Linux's new system call. Both require the programmer to design for them specifically.

    167. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Dear nigger:

      Please read the article to keep up with the technology being discussed. Sincerely, the Aryan race

      ps- i will drag you behind my pickup for miles

      (i didn't post this first, cribbed from another response to your kind of dumbass)
      http://slashdot.org/comments.pl?sid=150454&cid=126 15613

    168. Re:Wrong Side of Bed? by 10Ghz · · Score: 1
      while(){
      b = malloc();
      fill_in_buffer(b);
      write(b);
      }


      I just prefer this:

      10 PRINT "Look mom, I'm a programmer!"
      20 GOTO 10
      --
      Lesbian Nazi Hookers Abducted by UFOs and Forced Into Weight Loss Programs - -all next week on Town Talk.
    169. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      That's pretty nasty to do--the easiest way is to allocate 8K at a time, and use a page-aligned chunk from the middle of it. Talk about a waste of memory.

      posix_memalign?

    170. Re:Wrong Side of Bed? by shmget26 · · Score: 1

      Do I have that right?
      No, not at all.
      Linus comments had nothing to do with using COW for User-Space to User-Space. In fact Linux do use that technic for user-space fork()
      He is taking about Using COW to avoid copying data from User-Space to Kernel.

      I am not familiar enough with the subject to know if is claims are true, but at least I read it with enough attention not to make a Strawman out of it.

    171. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Considering buffers write-once-nonreusable would make for better, more functional-style code ANYWAY.

      I'd rather see a move to that than encourage buffer reuse.

    172. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      I prefer ptr+=size instead of malloc(). Now how much faster java was again?

    173. Re:Wrong Side of Bed? by xelah · · Score: 1
      The issue is implementing zero-copy IO. FreeBSD's way of doing it do a setsockopt() that causes any write() on that socket to mark the buffer CoW so that it can use it exclusively for handing down to the device driver.


      I'm looking at the FreeBSD manual pages, and I can't find any mention of this socket option. The zero_copy man page makes no mention of it either.


      You really ought to read zero_copy(9), you appear to have some wrong ideas about how it works. You say that any write() on that socket marks the buffer CoW. It doesn't. The write must be from a page-aligned buffer, it must be at least one page long and the MTU on the interface must be at least one page. Oh, and the (non-default) ZERO_COPY_SOCKETS options must be in the kernel and the relevant sysctls must be set appropriately.


      The "magic" is that if the programmer tries to use that buffer while the device driver owns it he will get a copy. BUT, the programmer has no way of knowing when that buffer is available again.


      Untrue. Read the zero_copy manual page. The programmer can safely reuse the buffer once twice the socket buffer size worth of data has been sent to the socket.


      Look, zero copy sockets are a specialist option, for specialist applications of the kind that need to keep a few gigabit ethernet cards fully used and have CPU time to spare. They're not there to make your web browser go faster. They're for developers who really need the extra performance and who are prepared to go to some effort (and to read the manual pages...) to achieve it.

    174. Re:Wrong Side of Bed? by statusbar · · Score: 1

      Well, good point.

      Does FreeBSD utilize Zero-copy for data transferred via aio?

      jeff

      --
      ipv6 is my vpn
    175. Re:Wrong Side of Bed? by markiv34 · · Score: 0

      A little jihad never hurt anyone

      --
      No Black or White only shades of Gray
    176. Re:Wrong Side of Bed? by squiggleslash · · Score: 1
      I hope I'm not responding this too late for anyone to answer. I think I roughly understand the concept, but I may be missing something major because I'm not sure I understand the examples.

      I'm guessing that what's happening goes something like this:

      - User program calls (ultimately) read(), a kernel call.
      - Kernel tells device driver to read X bytes into memory, where user has requested it.
      - Kernel marks the page(s) involved as "Funky stuff going on here"
      - Kernel immediately returns to user program, without waiting for device driver to do it's thing
      - Sometime later, when device driver has returned data, kernel restores the page(s) involved so the user program can read them without causing page faults.

      Ok, except this isn't copy on write, it's "break on read" or something, unless CoW is a term generically used all over the place.

      Is the right example being used? The way I envisage the system working would be more like:

      {....}

      char buffer[MAX_BUFFER_SIZE];

      for(i=0; i<datalength; i++) {
      sprintf(buffer,"%d: %s\n", i, something[i]);
      fwrite(buffer,1,strlen(buffer),fd);
      }

      {....}
      (Yeah, it's dummy code, pretend the sprintf is actually a more complex set of data formatting commands.)

      Now, in this case, I'm seeing where someone might implement something called Copy on Write, and where it would be "useful" but fail in the way you're describing. I'm envisaging the writing part of the above looking something like this:

      - user program calls (ultimately) write() call
      - kernel tells device driver to write characters at location X
      - kernel marks relevent pages as Copy on Write, so that if user program writes more data to them, and the device driver hasn't finished, the user program doesn't interfere with what the device driver's doing.
      - kernel immediately returns to user program, before device driver has finished.
      - some time later, when device driver has finished writing data, kernel marks the relevent pages as "safe" again.

      Is this a more appropriate example, or am I missing something and it really does affect read()s in a way I'm not understanding?

      --
      You are not alone. This is not normal. None of this is normal.
    177. Re:Wrong Side of Bed? by Athanasius · · Score: 1

      also btw its every write to a new page that triggers a copy not every single write

      I assumed the poster you're replying to was referring to the need to check the reference count on every write, not what you then do if the count was larger than one. That IS a large overhead, given it's for every write, it just gets worse on those writes that turn out to be to a currently shared page.

    178. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      You describe a special case of sendfile()- which does what you describe without the mmap() call.

      There isn't any need to mark the buffer specially if it doesn't come from a file- it's just important that it not get reused by anything but another zero-copy write operation.

    179. Re:Wrong Side of Bed? by mrsbrisby · · Score: 1

      That's more-or-less what FreeBSD does. The problem is that the user program ALWAYS attempts to reuse the buffer and so ALWAYS generates a page fault.

      So you can avoid this by never reusing buffers (until you absolutely have to), but the question becomes when CAN you reuse the buffers?

      What's really necessary is for the user program to be told when kernel space is done with the pages- and if you have this, you don't need to touch the page tables, or implement copy-on-write or anything.

    180. Re:Wrong Side of Bed? by fredrik70 · · Score: 1

      yes noticed that later.
      sorry, my mistake, that'll teach me for not rtfa!

      --
      if (!signature) { throw std::runtime_error("No sig!"); }
    181. Re:Wrong Side of Bed? by petermgreen · · Score: 1

      I assumed the poster you're replying to was referring to the need to check the reference count on every write
      but you don't!

      if there is only one reference you just tell the mmu to let the process write directly to the page.

      afaict cow works by denying a process write access to shared pages then trapping the errors that creates. pages with a refcount of 1 can still be written directly.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    182. Re:Wrong Side of Bed? by twoblink · · Score: 0

      Then why don't I see a serious thrashing from him, about something like Tux, having CoW is not ok, but shoving a webserver into a kernel is ok right?

    183. Re:Wrong Side of Bed? by CTachyon · · Score: 1

      Disclaimer: I'm not a kernel programmer and honestly don't know much more than the basics of VM on the x86 architecture. However, I have extensively read Intel's 80386 Programmer's Reference Manual. I also R'dTFA.

      Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write. [...] Do I have that right?

      No, you don't have that right. COW during fork() is staying, but that's because fork() is a much bigger operation than a read() or write(), and the kilocycles of pain caused by a TLB flush is worth it for something as big as a fork(). (As I understand, when the first write happens after a fork(), the kernel can pre-emptively "copy-ahead" and thereby coalesce multiple likely page faults into a single page fault.) However, when you're pushing around buffers that are only 4KB (one page) of memory or less, it doesn't make sense to spend many kilocycles on a page fault plus TLB flush in order to avoid ~4 kilocycles of just copying the damn memory. (However, if you're e.g. reading/receiving in a tight loop and not ever writing to the buffer, it can come out a win since the TLB flush doesn't happen on every I/O operation. This is especially likely in artificial benchmarks, which can trick a programmer into falsely concluding that COW is a net win.)

      In the case of a multiple-CPU system, you might even have to flush the TLB on multiple processors, which amplifies the damage to far beyond just copying the data. This is especially so if the program doing the I/O is multithreaded (e.g. MySQL).

      If you or anyone else reading this is not familiar with a TLB flush, here's an explanation that's as brief as I can make it:

      When the OS first boots, it sets up a mapping between VM and physical memory, called the "page table", then calls a special OS-only instruction that tells the processor where to look in physical memory for the page table.

      In a simpler world, every time the CPU was asked to read from or write to an address in VM-space, it would look it up by reading the page table entry from physical memory, doing the mapping, then performing the requested read/write on physical memory. In the real world, the CPU has a Translation Lookaside Buffer that caches the most recent page table entries; in typical use, memory latency is halved and memory bandwidth is doubled, so the TLB is a huge win. (The page table is variable in size and can be huge, so having the whole thing on the CPU is impractical.)

      The CPU provides a special OS-only instruction that must be called every time the page table changes. When that instruction is called, the CPU throws away the entire TLB, and depending on the design might have to throw away some or all of L1 and L2 cache as well. This costs thousands of cycles, both in the actual cache invalidation and in the refilling of all the emptied caches.

      Unfortunately for an OS programmer, one of the pieces of information that has to be cached in the TLB is the read-write-execute permissions of each entry in the page table. Whenever COW is used, the OS marks the COW-shared pages read-only (one TLB flush). When a program writes to a read-only page, the CPU hands control to the OS, an expensive event called a "page fault". The OS notices that the page fault is on a COW page, so it splits the VM copies into physical copies and marks the new one read-write (one page fault plus TLB flush). When the other thread writes to its copy (which is no longer shared but still read-only), the OS notices that the COW is no longer in effect and marks the old one read-write (one page fault plus TLB flush).

      This explanation completely glosses over a number of x86 details like segments, the GDT, and the LDT. Go download the manual from Intel for details.

      --
      Range Voting: preference intensity matters
    184. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0
      And of course, feel free to program one yourself and add it to the kernel.

      :/

    185. Re:Wrong Side of Bed? by Pseudonym · · Score: 1
      If short term fixes were implemented more often in the kernel, I suspect it'd be much worse off in stability, performance and scalability than it is now. Since linux kernel programmers are forced to do things "The Hard Way" in the kernel, it all but guarantees that they have an intimate knowledge of the systems and workings.

      And on the other hand, if the policy of doing it the right way was applied consistently, Linux would be a microkernel OS. But that's an old flamewar (the first flamewar about Linux, indeed!) and I don't propose to reopen it now.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    186. Re:Wrong Side of Bed? by pammon · · Score: 1
      Does FreeBSD utilize Zero-copy for data transferred via aio?

      I wish I knew. But it does say that "Modifications of the Asynchronous I/O Control Block structure or the buffer contents after the request has been enqueued, but before the request has completed, are not allowed." That leads me to believe that the buffer is not copied or copy-on-writed.

    187. Re:Wrong Side of Bed? by Athanasius · · Score: 1

      Aha! Thankyou for the correction. I was thinking the constant check on any write would be more than a little stupid, I forget that it's MMUs pulling these tricks in the first place.

    188. Re:Wrong Side of Bed? by quintesse · · Score: 1

      What the heck does debugging have to do with bad programming?? The only difference between you and me (just guessing here of course) is that you spend your time putting in loads of "print" statements to figure out where things go wrong while I fire up a debugger and spend my time stepping through code. Both of us will lose time that way and I'm not going to try to figure out who can do it quicker but I do know I positively hate having to go through code that is full of commented "debug" statements. Give me a debugger any day and I'm happy. That's just the way I code and I don't see how using a debugger would somehow make you forget about code design and patterns and whatever more.

    189. Re:Wrong Side of Bed? by GlobalEcho · · Score: 1

      Allow me to answer that one, Mr. "dubl-u".

      Write a PDE solver, Monte Carlo integration routine, or LU decomposer in a high-level language. Any high-level language. Write it again in C and note the 10x or 100x speedup without even optimizing. Optimize further, and you may get, what? 30% better, tops? So the reward for the development time invested is much greater in the transition from a high-level language to C, than it is in worrying about further optimizations once in C.

      Example: I wrote my own LU decomposer that comes within a fraction of the speed of the one in Intel's MKL, without worrying about optimizations. I find it implausible therefore that further optimization would do me whole lotta good. I've had similar experiences with a PDE solver I've been working with.

      The GP poster probably works in investment banking, like I do, where C still has its place in valuing complex instruments. The routines naturally belong in a library, and what the hell the GP ever does with fork() I don't know. But it is true that there is a lot of practical use of C out there that doesn't really scream for optimization.

    190. Re:Wrong Side of Bed? by metallic · · Score: 1

      Mem: 515156k total, 503476k used, 11680k free

      I just got that from the top command on our Linux box that serves the majority of our websites and handles a good chunk of our email. The server is still responsive, even with close to 98% of the RAM in use. Granted, the server is only serving about 100,000 pages a day but it does suggest that you are doing something wrong on your end.

      --
      Karma: Positive. Mostly effected by cowbell.
    191. Re:Wrong Side of Bed? by Paul+Crowley · · Score: 1

      So... you care about performance enough to change operating system in return for a 10% speedup, but not enough to really think about it?

    192. Re:Wrong Side of Bed? by Anonymous Coward · · Score: 0

      Well, introduce kernel debugger and you are bound to meet some sort of Screen Of Death, eventually. It is understandably something that is hard to take...

    193. Re:Wrong Side of Bed? by anandsr · · Score: 1

      /Flame on
      If you do respond before reading the responses then I think you are in the wrong business. You may attempt to modify a code without understanding all the ramifications of the code that is already there. I believe you must simply quit rather than trying to write code that will make life hell for all your users.

      > I want the kernel to run my code as fast as possible by default.
      Yes everybody wants a free lunch, but you have to pay for it. I don't expect you to know it, but Kernel does not run your code it just schedules it. Optimizing code is hard stuff. It is very easy to write badly performing code, just like it is easy to write obfuscated code. You can do any of the following to get your code to suck.
      1) Writing unaligned data structures (If your processor is good (ie nonintel) it will SIGBUS on you mercifully)
      2) Writing arrays without knowing your cache size.
      3) Writing multithreaded applications without the need.
      4) Using Mutexes/Semaphores/etc on large blocks of code without need.
      5) Doing unnecessary copies.
      This is just scraching the surface. So do you do 1). I have seen so many people do it that it is not funny. /Flame off
      BTW Linus is a very entertaining troll, but he is mostly correct about what he says. And he can take things as well as he gives. He is always ready to learn if he is wrong. But you have to prove him wrong, by submitting better performing code than the crap he wrote/accepted.

    194. Re:Wrong Side of Bed? by something_wicked_thi · · Score: 1

      There is a lot of confusion about this, so let me explain exactly what is going on here.

      Your second example (with the fwrite) is exactly what's being talked about. I don't believe this technique has any use for read (because reads wouldn't require the kernel to keep a temporary copy of anything, and they either block, or return EAGAIN, so there's no real concurrent work going on.

      So this subject is about writing. The problem here is that writes are asynchronous. The writes haven't necessarily happened by the time the write call returns. The network layer still has to wait to send the data. However, the kernel cannot guarantee that userspace will not free or overwrite the data it needs to send before it can send it. Therefore, the kernel must copy it.

      However, a possible solution is to use CoW for the page(s) that contain the data to send. If the userspace program attempts to free or modify a page that contains the data (note that it doesn't have to be the send data itself, since other data could share the page with the buffer, or the buffer could be freed and reused, or it could be on the stack and a new frame is pushed on, etc). When a write happens on that page, the kernel must not allow the page to change, because it has not finished with it. Therefore, it has to allocate a new page and copy it (note that if the buffer is less than the page size, this is a net loss since the kernel wouldn't have had to copy as much if it had copied the buffer in the first place), and give the *userspace* program the copy (the original must be kept for the kernel because it is probably part of the DMA scatter/gather list that is passed to the network buffer, so its physical address is not allowed to change).

      So now let's go through the actual steps of the CoW mechanism so you can understand how slow it can be. When fwrite is called, the CoW mechanism has to mark the page read-only, which means deleting the page from the process' TLB. That means that, the next time the process accesses that page, for either a read or a write, a TLB miss will occur and the page will have to be swapped in. If a write occurs, then the page must be copied for the userspace program. There is a possibility of having two page faults if the page is read then written (the read would result in a TLB miss, and the write would result in a page fault because the page is read-only).

      So, in addition to the overhead from copying the page, there is at least one page fault *for every page in the buffer* (if the buffer spans multiple pages and the userspace program reads or writes them all, then that results in a lot of extra overhead0. Furthermore, when the kernel is finished with the page, it has to mark it writable again, which means that, if the page had been reloaded into the TLB due to a read, it will have to be deleted again, thus eventually causing another page fault (if there was a page fault due to a write, then the kernel obviously never marks the page writable again as the page was copied and the userspace program is actually using a different page).

      So therefore, we had better hope that the normal case is for the kernel to finish with the page before the userspace program tries to modify it. However, as we have seen in the example code, it's very likely that the code will be written in such a way that the copy will happen, thus making CoW even slower than the copy-everything approach.

      Even barring all of that going wrong, the FreeBSD solution is poor because, apparently, it doesn't allow you to really tell when you are allowed to start modifying the page again, because you don't know when the kernel has finished with it. With Linux, the proposed idea is to allow the userspace program to request a notification when the kernel is finished with the buffer, so it can free it or reuse it at its leisure. Without that, any app on FreeBSD is simply guessing (it's a benign race condition in that, even if the race turns out poorly, the correctness of the program isn't affected, but performance is).

      I hope that explains everything and, if I've gotten anything wrong, I hope someone corrects me. If you have any more questions, just ask.

  2. Debian GNU/kFreeBSD by Douglas+Simmons · · Score: 1, Offtopic

    Development-wise, how much is Debian's FreeBSD port from Debian versus FreeBSD? Or are its advancements in tandem with both projects. And does either half suffer from the combination of the other?

    1. Re:Debian GNU/kFreeBSD by synthespian · · Score: 1

      Oh, it's been going on for...what 6 years or something?

      Why would you even want that? FreeBSD already has pkd_add (apt-get install), freebsd-update fetch (apt-get update), etc. With almost 15,000 ports, why complain? (And with people writing software for Linux only, some stuff you won't see on BSDs any time soon).

      When you look at a free software project, you also have to look at how the project manages people and developer resources, and Debian has a bad track record. When their number of packages grew to 16,000, the project went into grind-to-a-halt mode. Ubuntu saved the day, I guess. But it ain't Debian.

      Last I tried Ubuntu (just Hoary, or Warty, I think, I can't remember those names), it suffered from the same lack of decent documentation and disorganization Debian suffers (for instance, to install Java I had to read some guy's page on the Internet, and like, the best documentation on packaging is from a book you have to buy?) Plus, glitches in the upgrade process. And, what with the horrible PHP software for Ubuntu forums...Web forums? Not even a newsgroup? Geez...Are you kidding me?

      --
      Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
  3. Playing games with VM is bad. by digitaldc · · Score: 2, Funny

    Playing games with VM is bad.

    I know, I hate it when I have to listen to 26 hang up messages in my inbox only to find out someone is playing games with me. :(

    --
    He who knows best knows how little he knows. - Thomas Jefferson
  4. Tantrums by ClamIAm · · Score: 4, Funny

    Methinks we need to start tagging "tantrum" to this type of thing.

    1. Re:Tantrums by LiquidCoooled · · Score: 1

      Done...
      I tried to tag "throwing dummy out of the cot" but it wanted to split them up.

      --
      liqbase :: faster than paper
    2. Re:Tantrums by OctoberSky · · Score: 2, Informative

      How about Dvorakesque?

    3. Re:Tantrums by gEvil+(beta) · · Score: 1

      How about Dvorakesque?

      No, that's reserved for ridiculously ludicrous speculation that ultimately turns out to be true.

      --
      This guy's the limit!
    4. Re:Tantrums by VP · · Score: 1
      He-he, Linus got this to say on the kernel mailing list:

      From: Linus Torvalds [email blocked]
      Subject: Re: Linux 2.6.17-rc2
      Date: Fri, 21 Apr 2006 10:58:46 -0700 (PDT)

      I got slashdotted! Yay!

      On Thu, 20 Apr 2006, Linus Torvalds wrote:
      >
      > I claim that Mach people (and apparently FreeBSD) are incompetent idiots.

      I also claim that Slashdot people usually are smelly and eat their
      boogers, and have an IQ slightly lower than my daughters pet hamster
      (that's "hamster" without a "p", btw, for any slashdot posters out
      there. Try to follow me, ok?).

      Furthermore, I claim that anybody that hasn't noticed by now that I'm an
      opinionated bastard, and that "impolite" is my middle name, is lacking a
      few clues.

      Finally, it's clear that I'm not only the smartest person around, I'm also
      incredibly good-looking, and that my infallible charm is also second only
      to my becoming modesty.

      So there. Just to clarify.

      Linus "bow down before me, you scum" Torvalds
    5. Re:Tantrums by mccoma · · Score: 1
      No, that's reserved for ridiculously ludicrous speculation that ultimately turns out to be true.

      even a blind man will hit the dart-board, eventually, if throwing enough darts.....

  5. Did you hear that? by DaHat · · Score: 1, Insightful

    It sounded like the opening volley of the second great Unix war, only this time instead of pitting proprietary Unix vendors and systems against each other... it is two open source ones.

    It will be interesting to see what weapon the BSD crowd will retaliate with.

    1. Re:Did you hear that? by TrappedByMyself · · Score: 2, Insightful

      It will be interesting to see what weapon the BSD crowd will retaliate with.

      I would just prefer that their response is to release a stable system using their method.

      --

      Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
    2. Re:Did you hear that? by arosas · · Score: 2, Insightful

      With a comment like that, I can only imagine the kind of temper tantrum that Theo de Raat will throw. I mean honestly, whatever happened to common courtesy? There's no need for such comments like "incompetent idiots". I see so many people push for the advancement of OSS, only to find that it was in vain thanks to school-yard hissy fits like this.

    3. Re:Did you hear that? by Anonymous Coward · · Score: 0

      All the asshole (Theo deRaadt) has to do to respond is to point out "1 remote root exploit vs. *runs out of fingers counting* kernel exploits."

    4. Re:Did you hear that? by Anonymous Coward · · Score: 0

      Dude, Theo is busy raising cash (and doing well for all the slag he takes around places like this). He's been rather sedate as of late.

      Besides, his BSD is that of the Open, not of the Free.

    5. Re:Did you hear that? by Cid+Highwind · · Score: 2, Funny

      It will be interesting to see what weapon the BSD crowd will retaliate with.

      My guesses are they will respond something like this:

      FreeBSD: FreeBSD users will continue their campaign of random acts of elitist snobbery against Linux users.
      OpenBSD: Theo will threaten to stop work on OpenSSH unless Linus gives him $10,000 for every nasty email he sends a *BSD developer.
      NetBSD: Will stop developing the NetBSD port for Linus' microwave.

      --
      0 1 - just my two bits
    6. Re:Did you hear that? by the_humeister · · Score: 1

      Obviously a drive-by shouting match.

      "Oh Reginold! I disagree!!!"

    7. Re:Did you hear that? by Homology · · Score: 1
      It will be interesting to see what weapon the BSD crowd will retaliate with.

      What is that 'retaliate' that you talk about? One of the reason I use OpenBD, is that I don't have to worry about the weekly stream of Linux kernel local root exploits.

    8. Re:Did you hear that? by schon · · Score: 1

      I don't have to worry about the weekly stream of Linux kernel local root exploits.

      Dude. 2001 called, they want their ad-hominem back.

    9. Re:Did you hear that? by quantum+bit · · Score: 1

      I would just prefer that their response is to release a stable system using their method.

      No need -- such a system is already out there and in wide usage.

      As opposed to some obscure method using an non-portable interface (vmsplice) that's not even in the kernel version most people use for production servers.

    10. Re:Did you hear that? by Homology · · Score: 1
      Dude. 2001 called, they want their ad-hominem back.

      Dude, have you heard about Google?

    11. Re:Did you hear that? by sp0rk173 · · Score: 1

      BAHAHAHAHHHAHA!!! NICE!

    12. Re:Did you hear that? by Serilkath_Montreal · · Score: 1
      It will be interesting to see what weapon the BSD crowd will retaliate with
      A working OS ?
      --
      malheureusement la stupidité n'est ni curable, ni mortelle.
    13. Re:Did you hear that? by nuzak · · Score: 1

      Despite your conception of it, "ad hominem" is neither hyphenated nor a synonym for "wrong".

      --
      Done with slashdot, done with nerds, getting a life.
    14. Re:Did you hear that? by MBGMorden · · Score: 1

      I think it's just become fashionable to throw around logical fallacy titles in an argument even when you don't even know what they mean. It's like they're special moves out of an anime or something.

      Person 1: "I assert that 2 + 2 does not equal 5."
      Person 2: "Oh yeah?!?! Well Red Herring!!!!"
      Person 1: "You're not making any sense."
      Person 2: "Ad hominem!?!?! Your argument holds no weight."
      Person 1: "Please stop that. We'll never reach an agreement at this rate."
      Person 2: "OMG. NEWB!!! Slippery slope!!!
      Person 1: "I'm leaving."
      Person 2: "Pfft. Straw man if I ever saw one."

      And don't come and apply them to this post. This is what's known as a joke and as such is exempt from logical analysis.

      --
      "People who think they know everything are very annoying to those of us who do."-Mark Twain
  6. Maybe Linux needs a vacation? by Raleel · · Score: 1

    I respect him and all, and it's not like this sort of thing hasn't happened to me (the called people idiots and the like). Generally, it means that I need a break. Maybe he does too. Take the kids and the missus, go off for about a month, no computer.

    --
    -- Who is the bigger fool? The fool or the fool who follows him? --
    1. Re:Maybe Linux needs a vacation? by Neophus · · Score: 1

      Is there anyone Linus *have'nt* called idiot? :p I seem to recall that whenever he got something to critisize, he calls somebody an idiot/moron/asshat/.

      --
      Why do i have to be so lazy? :(
    2. Re:Maybe Linux needs a vacation? by Anonymous Coward · · Score: 0

      If you read LKML, Torvolds is pretty much a bombastic flaming asshat all the time.

      He'll pop in, take some extreme position about what is "right", claim that everyone who disagrees is "stupid". Then his minions crawl out of the woodwork to continue the flamewar on his behalf.

      I think most slashbots only read Linus' interviews in 'ZDNet' and therefore believe he's a reasonable individual. He probably is as he's changed his mind many times (VM subsystem, POSIX Threads, etc), but you wouldn't know it by his maillist interaction.

  7. Straight Gangsta' by MudButt · · Score: 0, Offtopic

    Pac, Biggie, Proof... How many gangstas gotta die before these turf wars end?

    1. Re:Straight Gangsta' by Anonymous Coward · · Score: 0

      Pak, Chooie, Unf. How many pusher robots gotta help before these robot wars end?

    2. Re:Straight Gangsta' by Anonymous Coward · · Score: 0

      Pac, Biggie, Proof... How many gangstas gotta die before these turf wars end?

      42

    3. Re:Straight Gangsta' by Anonymous Coward · · Score: 0

      The answer, my friend, is blowing in the wind...

    4. Re:Straight Gangsta' by Orrin+Bloquy · · Score: 1

      Sorry, that was me. Triple bean burrito at Taco Bell.

      --
      "Made up/misattributed quote that makes me look smart. I am on /. and I must look smart."
  8. Chiiiiiiil. by caluml · · Score: 0, Flamebait

    Linus - if you're reading this, which I know you are, and if you take my advice, which I know you do: Beer, barbeque, and your woman. Or a spliff. Whatever you like. Just relax. Breathe deeply. Yoga. Meditation. Tantric sex. Kinky sex. Whatever. Just relax man. You've done the hard work. And now just sit back, and watch Linux trounce over Windows and *BSD.

    1. Re:Chiiiiiiil. by ecklesweb · · Score: 1

      Seriously. And in what universe is anyone who can intelligently speak about (much less code around) memory and VM management an "incompetent idiot"?

    2. Re:Chiiiiiiil. by deadlinegrunt · · Score: 1

      "And in what universe is anyone who can intelligently speak about (much less code around) memory and VM management an 'incompetent idiot;?"

      According to some, like Java and C# fanboys for example, just about anyone who programs still using C...

      --
      BSD is designed. Linux is grown. C++ libs
    3. Re:Chiiiiiiil. by phayes · · Score: 1
      Linus - if you're reading this, which I know you are...
      Linus doesn't read Slashdot, he has better things to do.
      --
      Democracy is a sheep and two wolves deciding what to have for lunch. Freedom is a well armed sheep contesting the issue
    4. Re:Chiiiiiiil. by Cheerio+Boy · · Score: 1

      Linus - if you're reading this, which I know you are

      Linus Torvalds doesn't read Slashdot. He compiles kernel code until your post is brought to him out of /dev/null!

      --

      "Bah!" - Dogbert
    5. Re:Chiiiiiiil. by NutscrapeSucks · · Score: 1

      and if you take my advice, which I know you do: Beer ...

      I'm pretty sure that Linus has already taken that advice.

      --
      Whenever I hear the word 'Innovation', I reach for my pistol.
    6. Re:Chiiiiiiil. by Anonymous Coward · · Score: 0

      He does read slashdot, as proven by this:

      http://lkml.org/lkml/2006/4/21/247

      "I also claim that Slashdot people usually are smelly and eat their boogers, and have an IQ slightly lower than my daughters pet hamster"

      So there, it's official.

  9. Linus is turning into a dictator by Anonymous Coward · · Score: 1, Insightful

    Maybe it's the stress, dunno but this guy is developing a chip on his shoulder that needs to be knocked off.

    In the spirit of open source community development, he can't make statements like this and expect to be a role model for the open source community.

    1. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      There's this saying about NetBSD, you know, because Theo is (supposed to be) an a*hole.
      I guess this can be expanded to something like:

      [your favourite alternative to Linux] -- because Linus is one as well.

    2. Re:Linus is turning into a dictator by kryten_nl · · Score: 2, Informative

      In the spirit of open source community development, he can't make statements like this and expect to be a role model for the open source community.

      RMS, ever heard of him?

      --
      For the perfect anti-Unix, write an OS that thinks it knows what you're doing better than you do and let it be wrong.
    3. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      Yeah, instead we should look up to a calm, reasoned individual with a notoriety for being respectable: Theo DeRaadt!

    4. Re:Linus is turning into a dictator by Bogtha · · Score: 3, Insightful

      Dictator? Are the FreeBSD developers somehow unable to keep their implementation now that Linus deems it stupid?

      You might feel he's being a bit of an arsehole, but that doesn't mean he's a dictator. He's not stopping anybody from doing anything, he's merely sharing his opinion of a development technique on a mailing list dedicated to discussing the development of his kernel.

      --
      Bogtha Bogtha Bogtha
    5. Re:Linus is turning into a dictator by Lumpy · · Score: 5, Insightful

      No he is simply getting less tolerant of "sloppy" programming. He is one of the very very few that believes in doing it the way that gives you the best speed. Something that takes 4+ operations compared to a way of doing it with only 2 operations and you get less problems = performance gains that add up. Just because your typical machine has 4 dual core 8Ghz processors and 22 terabytes of ram does not mean you can slack off and write the whole thing without paying attention to performance.

      the BSD guys have their reasoning and if you read more info about this it is not a shot in the dark that Linus is taking but he is frustrated that after many discussions nobody cares as much as he does on the performance issues.

      Go back and read what Linus did back in the early days, it's no different today than what it was in 1990, he will call a duck a duck.

      --
      Do not look at laser with remaining good eye.
    6. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      Maybe it's the stress, dunno but this guy is developing a chip on his shoulder that needs to be knocked off.

      Maybe it's the fact that clueless twats are trolling the developer's mail list and making mountains out molehill discussions that they don't understand. Maybe the clueless twats should be kicked of the developer's mailing list so that developers can have technical discussions without morons creating a political situation where none exists.

      he can't make statements like this and expect to be a role model for the open source community.

      And just who the hell are you? What operating system kernel have you written that makes you feel that you are in a position to decide whether he has a chip on his shoulder or not, let alone that it "needs to be knocked off"??? What makes you such an authority on the matter that you can qualify his statements as "mistakes"????

      Shit! I just fed a troll, didn't I?

    7. Re:Linus is turning into a dictator by dinivin · · Score: 1

      Where has Linus ever said he expects/wants/needs to be role model for the open source community?

    8. Re:Linus is turning into a dictator by Anonymous+Writer · · Score: 1

      Maybe it's the stress, dunno but this guy is developing a chip on his shoulder that needs to be knocked off.

      I thought he stopped developing the Transmeta chip a long time ago.

    9. Re:Linus is turning into a dictator by Homology · · Score: 0, Troll
      No he is simply getting less tolerant of "sloppy" programming. He is one of the very very few that believes in doing it the way that gives you the best speed.

      To achieve that speed, quality is sacrificed (the Linux kernel as a new local root exploit averaging more than once a month). Drivers written under NDA is welcomed with open arms, and this gives the open source variant of binary blobs.

      the BSD guys have their reasoning and if you read more info about this it is not a shot in the dark that Linus is taking but he is frustrated that after many discussions nobody cares as much as he does on the performance issues.

      Linus does not care about quality, but the *BSD does.

    10. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      Dude, Linux has orders of magnitude more sloppy programming than the BSD kernels. The main problem is that Linus has no concern for quality and scrifices that for speed.

    11. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      Stop thinking is is some kind of omniscient god. I means, this guy have next to zero understanding of software engineering (for instance, how long did it takes to move to some sort of source control ? next to forever).

      When I use my ubuntu machine under memory load, the system stops and become unusable. A runaway process eating all the memory and CPU, and the systems hangs. Takes minutes to recover.

      Yes, it does that. It always does. It have always been 'the linux way'.

      I don't think linus is an expert at memory management in the real (os-wide) world.

      Now, my freebsd machines (which I seldom use, and I am typing this on a XP host), *never* exhibited that kind of behavior on a 4.x branch.

      If only the same INSANE amount of work that is putted into linux could be put into properly designed BSDs, it would be fantastic.

      (Yes, I know, I should get kernel version $(LATEST), and that particular proble is fixed. That is the standard answer and I am fed up with it).

    12. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 1, Insightful
      How about proving Linus wrong, or shutting up and keeping a respectful distance where kernel affairs are concerned like the rest of us.

      Touchy-feely happy time doesn't belong in the most critical part of an OS. This is where you put up or shut up. There aren't a lot of people who can do this kind of work, but an army of people who want to waste time trying to glom on and "belong" to the culture. They're a liability, and it's a good thing to have the kind of attitude that shakes them off. Mind: I'm not saying the BSD developers are glomming. But if you've seen their mailing lists, you know that they're just as bullshit-free and will take this as a challenge and change FreeBSD or prove Linus wrong. They won't just whine about it like a bunch of little girls.

    13. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      Dude. None of the BSDs run on systems even NEAR the size of the biggest linux machines. When you're talking about TLB overhead, it isn't for a 1- or 2-processor system it gets really expensive. It's when you have to send the TLB shootdowns around the interconnect to all other processors that might have to have that page mapped as well.

      THAT is expensive. Think 64, 128 or 256 processors like the big IA64 and PPC64 machines have. And yes, they do run Linux. In production. I'm sure it's easily noticable on 8-way opterons as well.

      NetBSD still barely does SMP. FreeBSD runs on small SMPs. They just don't have the same requirements on their implementation.

    14. Re:Linus is turning into a dictator by rtaylor · · Score: 4, Insightful

      No he is simply getting less tolerant of "sloppy" programming.
      You'll forgive me for taking that with a grain of salt so long as memory over-commit remains the default mode of operation within Linux.

      --
      Rod Taylor
    15. Re:Linus is turning into a dictator by TummyX · · Score: 1


      No he is simply getting less tolerant of "sloppy" programming. He is one of the very very few that believes in doing it the way that gives you the best speed.


      Uh. Programming for "best speed" usually results in sloppy programming. Have you ever seen the code Lunus writes? Ick. Sloppy would be a polite way of putting it. He needs to take a course in software engineering. Does his code work? Yes. Is it easy to maintain? No. Is it sloppy? Yes.

    16. Re:Linus is turning into a dictator by PietjeJantje · · Score: 1

      >No he is simply getting less tolerant of "sloppy" programming.

      Some argue the entire Linux kernel is a monolitic hack (Tanenbaum).
      Some argue that FreeBSD is a much cleaner, secure system.
      Combined, some even argue that Linux is the celeberation of sloppy programming.

      I'm no expert as they are, but they're all incredible smart people, and they all have credible arguments (for e.g. the above), but only Linus is calling the other incompetent.

    17. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      This is the same guy who derides RMS for sticking to "the right way" as he sees it, yes? So why not deride him when he sticks to the idea of doing it the "right way"?

      I'm wondering if someone in *BSD has reverse-engnieered another friends' system.

    18. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      No he is simply getting less tolerant of "sloppy" programming. He is one of the very very few that believes in doing it the way that gives you the best speed.

      The fact that you think sloppy programming results in slow speed shows that you don't know what you're talking about. Cutting corners and using hacks to speed things up is sloppy. Linux may be fast, but it's the definition of sloppy programming.

    19. Re:Linus is turning into a dictator by Laser+Lou · · Score: 2, Insightful

      Linus long been called a "Benevolent Dictator for Life". I guess this supports the idea that, with all dictatorships, you get more that what you bargained for.

      --
      No data, no cry
    20. Re:Linus is turning into a dictator by Just+Some+Guy · · Score: 1
      Something that takes 4+ operations compared to a way of doing it with only 2 operations and you get less problems = performance gains that add up.

      And sometimes someone comes up with a way that takes 128 operations but boosts throughput dramatically. For instance, the "new" anticipatory IO scheduler deliberately adds delays after reads before seeking. Although that would actually cripple certain pathological cases, it actually increases normal workload throughput by a large amount.

      Amateurs write complex algorithms. Advanced programmers replace them with simple ones. Experts replace those with complex ones. Simple is nice, sometimes, but isn't always the clear winner.

      --
      Dewey, what part of this looks like authorities should be involved?
    21. Re:Linus is turning into a dictator by Cyno · · Score: 1

      And CFQ is better for some workloads because its not specialized..

      "It tries to optimise for physical disks by avoiding head movements if possible - one downside to this is that it probably give highly erratic performance on database or storage systems." - link

      I think keep it simple stupid should be common sense by now. But sometimes it takes calling people idiots to get them to think about it.

      I don't know who's right in this case, but I like the way Linus called those devs idiots. That makes me smile. People should drop their egos and get back to debating.. the gloves are off, Fight!

    22. Re:Linus is turning into a dictator by Cyno · · Score: 1

      FUD

      sorry for feeding..

    23. Re:Linus is turning into a dictator by Jherek+Carnelian · · Score: 1

      You'll forgive me for taking that with a grain of salt so long as memory over-commit remains the default mode of operation within Linux.

      The default mode for memory over-commit is that obvious over-commits are disallowed.
      This behaviour is less tolerant than simply allowing any and all over-commits.

      This seems to match the original poster's contention that Linus is getting less tolerant rather
      than completely intolerant of "sloppy programming."

    24. Re:Linus is turning into a dictator by katsklaw · · Score: 1

      Any coders tolerance for sloppy coding has no right outside his/her own project(s).

    25. Re:Linus is turning into a dictator by mrsbrisby · · Score: 1

      You'll forgive me for taking that with a grain of salt so long as memory over-commit remains the default mode of operation within Linux.

      That works under the assumption that VM overcommit is bad.

      The PostgreSQL people seem to think so- lots of people seem to think so.

      But believe it or not, lots of people think it's a good thing, and structure their system so that they can take advantage of the good things that VM overcommit buys them- it prevents obvious deadlocks (when paired with the OOMK), and improves system stability (when using resource limits and inittab).

      Unforunately, lots of developers seem to think that the system administrator will always be selecting resource limits- even if they don't use them when developing (and as a result, cannot recommend reasonable ones). Lots of developers also shun inittab, and like to pretend that their program can run as long as init can- and any reason why it doesn't isn't their problem or fault.

      Lots of administrators even go to great lengths- using things like monit- to simulate the behavior that inittab and the oomk already provides.

      So don't make a baseless argument without justification- You might say that overcommit isn't good for default Debian, Fedora, and other LSB-ish systems that come off the pipeline, or you might have some other reason, but to suggest that it's mind-blowingly stupid means only one of two things:

      a) _YOU_ and YOU alone are right, or
      b) everyone is right but you.

      And you'll forgive us if we hope that you don't think it's this way, that you really honestly do have something to contribute besides everyone but you is right- no justification- no clarification- and etc.

      Because if you really do think this way, then we have to point out that you're completely and totally wrong, and the world and state of things is not quite so simple.

      Do understand that the difference between Linux and FreeBSD zero-copy interfaces is that Linux gives explicit notification when a page is no longer reserved for zero-copy, and FreeBSD doesn't. FreeBSD guards against this with CoW, or with a blocking system call. Linux avoids the whole CoW thing because the EXACT SAME HOOPS need to be jumped through on each interface, except the Linux interface isn't guesswork.

      Now, with that said, who exactly would think making a program guess is a good idea? That's exactly what Linus thinks FreeBSD is doing- it's not like the FreeBSD people like CoW- they even suggest ways to avoid it in the zero_copy manual page. They just aren't quite clever enough to think of adding explicit notification.

      Now once we have explicit notification, doing CoW just wastes time- it doesn't buy anything at all anymore. So why bother doing it? Once explicit notification is there, what good does CoW do?

      It guards against naive implementations. Those aren't right anyway, and it's better to make the user fail earlier on in development than randomly later.

    26. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      Why is it when Theo de Raadt voices his opinion on other programmers he's consistently shouted down and called a hot head and arsehole by /.ers but when Linus Torvalds does the exact same thing people say he's just intolerant of sloppy programming? I'm not a system programmer but I have great respect for those in the free software community who *are* highly skilled engineers and I don't think it's fair or appropriate for anyone in a leadership position to be so belittling and blatantly disrespectful to other developers just because you disagree with their technical choices.

    27. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      You seriously want to engage in a discussion about "sloppy" programming when Linux allows VM over commit to kill a process that might be in the middle of a transaction so other processes can no longer wait for the transaction to complete?

      Seems like a kettle should be staring back at you - it happens to be the color of black...

      This discussion should be about achieving zero-copy on data pathes - the COW was fork/exec optimization for the longest time. People have been extending the idea to data pathes to see how far they can go to achieve zero-copy. By the way, Linus use to not believe zero-copy mattered since no real world application could benefit from it. Perhaps those old threads shows a fair bit of learning still needs to be done in the Linux community as a whole.

    28. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      No - the issue basically is, Linus usually prefers performance, FreeBSD developers usually prefer strict behavior; this is also true in this case.

      I don't see how you can characterize preferring more strict/correct behavior as "sloppy programming".

    29. Re:Linus is turning into a dictator by Anonymous Coward · · Score: 0

      I guess it was lucky that the comment was posted on the Linux Kernel Mailing List and was in a thread discussing how Linux should handle the situation. It's not like Linus went off to the FreeBSD mailing list to flame them on it.

      BTW, it should be "coder's", not "coders".

    30. Re:Linus is turning into a dictator by runderwo · · Score: 1

      You know, you can call setrlimit() yourself if you want to self-limit your memory usage. There is no need to rely on the system administrator to do this.

  10. Given the respective quality of the Linux and *BSD by Anonymous Coward · · Score: 5, Funny

    kernels, me thinks it's just sour grapes because Linus can't compete in that area.

  11. Linus needs a cup of 'nice' by guysmilee · · Score: 1, Insightful

    What is with this guy slamming on people. I am sure he's bright and all but would it hurt him to try and be constructive instead of just being rude. He could simply provide a example of what problems will occur and then stop.

    1. Re:Linus needs a cup of 'nice' by fak3r · · Score: 1

      You must be new here.

    2. Re:Linus needs a cup of 'nice' by Anonymous Coward · · Score: 0

      What is with this guy slamming on people.

      Maybe he had an ass burger for lunch?

      (Posting AC because the mods, having asperger's, have no sense of humor :-/

    3. Re:Linus needs a cup of 'nice' by Jesapoo · · Score: 1

      He should be constructive?

      Ever heard of this thing called "linux"? Yeah, that's kinda his thing... You know. Being constructive.

    4. Re:Linus needs a cup of 'nice' by Anonymous Coward · · Score: 0

      um.. EVERYONE has the equivalent of aspergers in a typed conversation since you can't see the face or body of the typist or hear the intonations of their voice.

  12. Just finished reading "Thud!", sorry... by Anonymous Coward · · Score: 2, Funny

    Is that my COW?
    it goes "incompetent idiots."
    It is a Torvalds.
    That is not my COW.

    1. Re:Just finished reading "Thud!", sorry... by wzrd2002 · · Score: 1

      the cow!! where? where?? here?!

    2. Re:Just finished reading "Thud!", sorry... by geekoid · · Score: 1

      very good. +1 geek point.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  13. stupid linux by Anonymous Coward · · Score: 0

    If Linux is so much better, then everyone will use it over Mach, right?

  14. Incompetent idiots by LoonyMike · · Score: 0

    Won't take long before Linus starts throwing chairs.
    Is it me, or "powerful" people are never humble?

  15. Re:Maybe Linus needs a vacation? by Raleel · · Score: 1

    gah, i meant Linus in the subject.. jeeez

    --
    -- Who is the bigger fool? The fool or the fool who follows him? --
  16. Next on FOX by Anonymous Coward · · Score: 1, Funny

    When OS developers attack!
     
    Is it just me, or does the hype outstrip the events?

  17. I call bullshit! by OctoberSky · · Score: 3, Funny

    As a Slashdot user there is no way in hell you have 26 messages on your phone machine. Maybe 3 messages, but thier probably your mom calling to ask you when your coming out of the basement, your friend inviting you to stand in line for tickets to the latest Sci-Fi flick and the Pizza guy confirming your order of 2 large and a 2 liter of Mt. Dew on a Friday night.

    1. Re:I call bullshit! by Afecks · · Score: 1

      Damn now I want pizza!

    2. Re:I call bullshit! by Anonymous+Writer · · Score: 1

      As a Slashdot user there is no way in hell you have 26 messages on your phone machine.

      There are those rare occasions when someone somewhere bloody enters the wrong fax number and the blasted machine repeatedly autodials my phone.

    3. Re:I call bullshit! by Anonymous Coward · · Score: 0

      "Hi, it's Butch. I just wanted to say that I had a great time last night. Call me back." ...
      "Hi, Butch again. Forgot to leave my number. It's 555-7843. Call me back." ...
      "Hi, Butch here. Area code 530. Sorry, forgot to mention it."

    4. Re:I call bullshit! by cprior · · Score: 1

      (It's friday night, I am here to read the "Funny" stuff, so here's a contribution of mine, too)

      There was this single occasion when my friend and co-worker in a youth hostel enquired per fax for sport event tickets -- with many, many agencies. And gave the number of the main telephone line as fax-reply-number...

      You know, this was the time when the landline still had this huuuge audible-all-over-the-place bells attached to it.

      But it was already the time when ticket agencies had dial-through-the-night fax servers...

    5. Re:I call bullshit! by Anonymous Coward · · Score: 0

      and the Pizza guy confirming your order of 2 large and a 2 liter of Mt. Dew on a Friday night.

      You tapped my phone, didn't you? So, how's that job at AT&T workin out for you?

    6. Re:I call bullshit! by Anonymous Coward · · Score: 0

      It's Friday?

  18. He says they're smart! by jmilezy · · Score: 0

    "I claim that Mach people (and apparently FreeBSD) are incompetent idiots." I'd imagine that a compentent idiot is one who's smart at being stupid. A incompetent idiot is someone who's not smart at being stupid. That would make them simply smart.

  19. Re:Linux trounce over ... *BSD by Anonymous Coward · · Score: 0

    Not likely. Oh, yeah. BSD is dead...

  20. Hmm, where have I heard this recently? by Anonymous Coward · · Score: 0
    I had to check; no, it's not a dupe, just a recurring theme.

    Are the *BSD people are nicer? Or at least more tactful?

    1. Re:Hmm, where have I heard this recently? by dow · · Score: 3, Insightful

      Are the *BSD people are nicer? Or at least more tactful?

      No. Thats why there is more than one BSD. Issues come up, and booom crash goes the fork. Pity.

    2. Re:Hmm, where have I heard this recently? by kashani · · Score: 1

      No. It's just that no one cares what they have to say any more. :-)

      kashani

      --
      - Why is the ninja... so deadly?
    3. Re:Hmm, where have I heard this recently? by compass46 · · Score: 1

      Are the *BSD people are nicer? Or at least more tactful?

      Trying to make useful generalizations about "*BSD people" is pointless. It's too diverse a group. I used to make fun of Gentoo due to the fanboys on /. until I actually met a few of their devs at Linux World Expo. Again, another large diverse group.

      In addition, every project has there share of friendly people and total assholes. There are people from my project I like talking to and others I don't. There are people from other projects I like talking to and others from the same project I don't.

      No. Thats why there is more than one BSD. Issues come up, and booom crash goes the fork. Pity.

      And that's why there re eleventy-billion Linux distros. ;)

    4. Re:Hmm, where have I heard this recently? by hdparm · · Score: 1

      Isn't it funny how kernel feature in question (COW makes forking fast) has much greater influence on OS than just app performance? Linus is right, again.

  21. Wait... by Anonymous Coward · · Score: 0

    If Mach people are idiots, and Apple uses Mach - doesn't that make Apple (and in turn its users) a bunch of idiots too? But don't worry, Apple can redeem itself once it switches to the NT Kernel.

  22. Wrong side of compiler by StarKruzr · · Score: 5, Funny

    I think Linus has gotten to the point where he just really enjoys trolling. Like, this was OBVIOUSLY uncalled-for, and he's usually such a laid-back guy. Maybe's he's read too much Slashdot. I don't know.

    --

    +++ATH0
    1. Re:Wrong side of compiler by nuzak · · Score: 5, Interesting

      Actually he's been into boorish behavior from day 1 when it comes to microkernels. Namecalling between him and Tanenbaum (admittedly Tanenbaum is a bit haughty and provoking), and his slanderous accusations against microkernel researchers in general (a quote I can't find at the moment, but he basically accuses them all, as one big class, of academic fraud to procure grant money).

      The only microkernel Linus knows jack about is Mach, an ancient piece of crap, which indeed is Linus indeed calls it. It's unfortunate real-world systems were saddled with it, and it's got real performance issues, but Linus carries on about it like Mach ran over his dog or something.

      He conveniently ignores or chooses to remain ignorant of the fact that L4Linux is typically faster than Linux itself. To say nothing of the real-world success of QNX. And even L4Linux is pretty old by today's standards.

      This is all pretty typical behavior of Linus: bluster now, bone up and learn, and implement it later. He did so with SMP (saying famously that the way to do it was one Big F**ing Lock, then learning that no this wasn't such a great idea after all). Then he went on a tirade about sun's /dev/poll before learning that yes they actually didn't cheat and they did it smarter, and Linux followed.

      Ultimately, Linus and Linux come around. Sometimes he just has to vent.

      --
      Done with slashdot, done with nerds, getting a life.
    2. Re:Wrong side of compiler by Anonymous Coward · · Score: 0, Troll

      The "real-world" system which is saddled with Mach is, of course, OS X. Because Linus makes an effort to insult Mach, it almost makes me think that he's simply jealous of OS X.

      Jealous that in OS X, things just work ... that things look pretty ... and that they're smooth. Everything that Linux on the desktop wants to be.

    3. Re:Wrong side of compiler by nuzak · · Score: 3, Informative

      Linus was slagging off Mach long before OSX was around. OSF/1 was based on Mach. The sun doesn't really revolve around Apple.

      --
      Done with slashdot, done with nerds, getting a life.
    4. Re:Wrong side of compiler by Anonymous Coward · · Score: 0

      > Linus was slagging off Mach long before OSX was around. OSF/1 was based on Mach.

      Ah, but OS X is derived from NEXTSTEP, which was based on Mach long before Linux started slagging off microkernels.

      > The sun doesn't really revolve around Apple.

      Oh... then does the Apple revolve around Sun??

    5. Re:Wrong side of compiler by arivanov · · Score: 5, Insightful

      More likely he had some really bad acid the previous night.

      After all he did more than 6 revisions of the Linux VM using CopyOnWrite before this latest fad.

      Possibly more.

      Off the top of my head that is at least 1 in the 1.2 tree, 1 in the 2.0 tree, 1 in the 2.2 tree, 2 in the 2.4 tree and more than 2 in the 2.6 tree, all of which being CopyOnWrite and at least some of which has been hailed as the next best thing after hot bread.

      As far as the technical point he is possibly correct for x86 where COW goes through the fault mechanism and causes some TLB and cache abuse which is really bad on modern CPUs. I am not sure as far as other architectures are concerned, because IIRC (I may be wrong) the memory mapper hardware on the old Sparc was designed for COW in first place.

      Anyway, before calling somebody else an idiot for something you have happily done for 10+ years till yesterday it may be nice if you look at yourself in the mirror. Because I never remember any branch of FreeBSD reaching the point where you can do a find /usr -exec cat {} > /dev/null \; to hang the system. That is 2.6.16 at your service (from rc4 onward) on at least two x86 subarchitectures where I had the time to test it. That is besides the unkillable processes in [S] state on an nfs flock in 2.6.14 (yep, that is a gem which no other unix has managed so far), besides the OOM idiocies in 2.6.10, besides deliberately making it absolutely impossible to backtrack any more interesting patch to a previous kernel without employing a team of kernel developers because the VM and locking is not compatible across any kernel version since 2.6.9 and even when it is something else is changed like the tty layer, besides.... Aarghh.....

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    6. Re:Wrong side of compiler by rubycodez · · Score: 2, Informative

      what the? L4Linux has to run on top another REAL kernel, usually Linux. QNX is a realtime operating system, not a general purpose desktop or server one. And where's the spicy name-calling, I see the minix way being called "brain-damaged", heheh, but maybe I missed a good personal zinger? Mach probably bugs Linus because its still being used, even for relatively (in comparison to Linux) new projects & newer commercially sold OS.

    7. Re:Wrong side of compiler by bfields · · Score: 5, Funny
      I think Linus has gotten to the point where he just really enjoys trolling.
      Could be:
      I got slashdotted! Yay!

      On Thu, 20 Apr 2006, Linus Torvalds wrote:
      >
      > I claim that Mach people (and apparently FreeBSD) are incompetent idiots.

      I also claim that Slashdot people usually are smelly and eat their
      boogers, and have an IQ slightly lower than my daughters pet hamster
      (that's "hamster" without a "p", btw, for any slashdot posters out
      there. Try to follow me, ok?).

      Furthermore, I claim that anybody that hasn't noticed by now that I'm an
      opinionated bastard, and that "impolite" is my middle name, is lacking a
      few clues.

      Finally, it's clear that I'm not only the smartest person around, I'm also
      incredibly good-looking, and that my infallible charm is also second only
      to my becoming modesty.

      So there. Just to clarify.

      Linus "bow down before me, you scum" Torvalds
    8. Re:Wrong side of compiler by Anonymous Coward · · Score: 2, Insightful

      I just don't seem what the big deal is with Linus's comment. Of course I'm an OpenBSD person, and can hardly consider a mere 'incompetent idiots' to be a serious disparagement.

    9. Re:Wrong side of compiler by morcego · · Score: 1

      OSF/1 was based on Mach.

      Ok, you lost me. Which point of view are you defending here ? From this single statement, I would guess Linus'.

      --
      morcego
    10. Re:Wrong side of compiler by Anonymous Coward · · Score: 0

      Actually, you're mistaken with the Tanenbaum comment. It's Andy who' s the big ass who's done nothing but criticize Linus and Linux, despite his own greatest claim to fame being Minix the pseudo-OS. Linus gave Andy the nod for his book, and Tanenbaum should have accepted it graciously and then shut up if he had nothing good to say.

    11. Re:Wrong side of compiler by WgT2 · · Score: 2, Insightful

      I'm not too surprised, as he seems to be somewhat of a visionary: he see things as he thinks they should be... and explodes when they aren't. ;)

      What I don't get is why he choose to use incompetent to describe a group of people who are not implementing something he is just now implementing himself.

    12. Re:Wrong side of compiler by Whiney+Mac+Fanboy · · Score: 1

      The sun doesn't really revolve around Apple.

      Blasphemer!

      (heh)

      --
      There are shills on slashdot. Apparently, I'm one of them.
    13. Re:Wrong side of compiler by Anonymous Coward · · Score: 0

      He has always been like that. Very entertaining. You all should learn Finnish just to see his trolling in the sfnet.* groups in the good old 90's. Google Groups has all of the posts archived.

    14. Re:Wrong side of compiler by Anonymous Coward · · Score: 0

      Actually, FreeBSD 6.x has serious issues with its lockd and unkillable processes are the norm while running it. The workaround is to not use rpc.lockd.

    15. Re:Wrong side of compiler by drinkypoo · · Score: 1

      The IS a QNX desktop. It's rather speedy, but there's naturally not as much software for it as whatever else, so after the initial excitement pretty much everyone got over it. Regardless, it's Unixlike and POSIXish.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    16. Re:Wrong side of compiler by kv9 · · Score: 4, Funny
      Finally, it's clear that I'm not only the smartest person around, I'm also incredibly good-looking, and that my infallible charm is also second only to my becoming modesty.

      i know what's the last book that linus read. do you?

    17. Re:Wrong side of compiler by rubycodez · · Score: 1

      yup, it's cute and great for running a couple apps, but QNX can't do its guaranteed service thing if too much is happening (like any other realtime OS), get too many windows going while burning a DVD or even writing a file and you're screwed. That's why realtime OS aren't sold as general user/desktop or server OS, for those we can be interrupt driven for things that *have* to get done, and if there's too much of the rest being asked to get done we just let those other thirty running apps run slow as mud 8D

    18. Re:Wrong side of compiler by Logic+and+Reason · · Score: 1

      He conveniently ignores or chooses to remain ignorant of the fact that L4Linux is typically faster than Linux itself.

      I was under the impression that L4Linux has a 5-10% performance overhead versus native Linux. This is still pretty good given that L4Linux is pretty much just Linux running on top of the L4 microkernel, and it's certainly much better than (say) MkLinux, but "faster than Linux itself"? Unlikely.

    19. Re:Wrong side of compiler by AnalystX · · Score: 1

      I might have missed someone else posting this, but then again, this should probably be a post with a score of 5 for visibility.

      From the kerneltrap.org (article) list of messages:

      From: Linus Torvalds [email blocked]

      Subject: Re: Linux 2.6.17-rc2

      Date: Fri, 21 Apr 2006 10:58:46 -0700 (PDT)

      I got slashdotted! Yay!

      On Thu, 20 Apr 2006, Linus Torvalds wrote:

      >

      > I claim that Mach people (and apparently FreeBSD) are incompetent idiots.

      I also claim that Slashdot people usually are smelly and eat their

      boogers, and have an IQ slightly lower than my daughters pet hamster

      (that's "hamster" without a "p", btw, for any slashdot posters out

      there. Try to follow me, ok?).

      Furthermore, I claim that anybody that hasn't noticed by now that I'm an

      opinionated bastard, and that "impolite" is my middle name, is lacking a

      few clues.

      Finally, it's clear that I'm not only the smartest person around, I'm also

      incredibly good-looking, and that my infallible charm is also second only

      to my becoming modesty.

      So there. Just to clarify.

                      Linus "bow down before me, you scum" Torvalds

    20. Re:Wrong side of compiler by nuzak · · Score: 1

      Linus's, obliquely. Mach is a mangy dog. Linus however seems to believe erroneously that all microkernels are Mach. I'll give him credit for being smart enough that he knows better, but it doesn't serve his rhetoric for him to admit it.

      --
      Done with slashdot, done with nerds, getting a life.
    21. Re:Wrong side of compiler by nuzak · · Score: 4, Informative

      > what the? L4Linux has to run on top another REAL kernel, usually Linux.

      You're quite mistaken. L4Linux runs Linux in usermode on top of the L4 kernel.

      http://os.inf.tu-dresden.de/L4/LinuxOnL4/

      --
      Done with slashdot, done with nerds, getting a life.
    22. Re:Wrong side of compiler by Gentlewhisper · · Score: 1

      >>Oh... then does the Apple revolve around Sun??

      Let's see... apples on Earth, Earth revolves around Sun.

      I harbour to say, yes.

    23. Re:Wrong side of compiler by slashdot_commentator · · Score: 1
      And even L4Linux is pretty old by today's standards.

      L4Linux is old??? Hell, L4 itself didn't stabilize ("standardize spec", whatever you call it) until a few years ago.

      My question is: what the heck is newer AND working?

      --
      There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
    24. Re:Wrong side of compiler by Anonymous Coward · · Score: 0

      They're not talking about COW based forks. You idiot. Maybe you should read the thread
      next time. Thanks for regurgitating that load of crap, though.

      Tell me something though, what kind of COW implementation would you use, that doesn't
      "go through the fault mechanism"?

    25. Re:Wrong side of compiler by PygmySurfer · · Score: 1

      I might have missed someone else posting this, but then again, this should probably be a post with a score of 5 for visibility.

      Did you just claim your post should be modded +5? Now that's ballsy :)

    26. Re:Wrong side of compiler by AnalystX · · Score: 1

      I said, "this should probably be a post with a score of 5" not "my post should have a score of 5." The the pronoun, "this" refers to "a post." Even if you were to pick an antecedent, it would refer to the first "this" which has the context of "someone else." I was hoping that if "a post" with this information did already exist, someone would mod it up so that it would be more visible.

    27. Re:Wrong side of compiler by jericho4.0 · · Score: 1

      Who cares about linux on the desktop? We're talking about kernels, a very different thing from whatever desktop you're useing.

      --
      "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
    28. Re:Wrong side of compiler by makomk · · Score: 1

      This garbage got +5, Insightful? As at least one hopelessly-underrated AC has already pointed out, it's not about copy-on-write fork()s or any of the other existing copy-on-write stuff related to mmap, it's about using it to do various magic related to send() and the like - something he's always been against.

    29. Re:Wrong side of compiler by StarKruzr · · Score: 1

      I'm sure things would Just Work for Linux too if the kernel and distro developers had only one small set of hardware platforms to support.

      --

      +++ATH0
    30. Re:Wrong side of compiler by quintesse · · Score: 1

      Tried it once and it's an incredibly difficult language to learn even if the written language is phonetic which is probably also the only easy thing about Finnish ;-) No flames please dear Finnish people, nothing against you nor you language, just my personal frustration with being unable to learn :-)

  23. Kernel Brawl by malus · · Score: 1

    East Coast kernel-fu vs. West Coast kernel-fu.... FIGHT!

    1. Re:Kernel Brawl by Anonymous Coward · · Score: 0

      West coast till I die !

    2. Re:Kernel Brawl by Anonymous Coward · · Score: 0
    3. Re:Kernel Brawl by Anonymous Coward · · Score: 0

      That can be aranged.

    4. Re:Kernel Brawl by Arandir · · Score: 1

      Alternately, it could also be South Bay kernel-fu (Santa Clara) versus East Bay kernel-fu (Berkeley)...

      --
      A Government Is a Body of People, Usually Notably Ungoverned
  24. If I only had a brain by swordfish666 · · Score: 2, Funny

    This is so tech I don't even undersatnd what they are talking about yet I am very "Intellectually Curious".

    --
    I like-a do-the cha-cha.
  25. Netcraft confirms it by nubbie · · Score: 0, Troll

    Netcraft confirms linux is dying. /FreeBSD user. //who am I kidding, uses linix too

    --
    'Go for the eyes, Boo, go for the eyes, aaarrrrrrrr!' -- Minsc
  26. Microsofts answer to that by Anonymous Coward · · Score: 0

    "Windows Vista(tm) will use a patented algorithm for copy on write, called SuperCOW(tm)"

    1. Re:Microsofts answer to that by keithmo · · Score: 2, Funny

      No, in Longhorn, it's called COW-tipping(tm).

    2. Re:Microsofts answer to that by Anonymous Coward · · Score: 0

      Or if you shell out for the Ultimate edition, Glass COW-tipping(tm)

    3. Re:Microsofts answer to that by Gorshkov · · Score: 0, Offtopic

      carefull .... if the COWs with guns don't get you, the chickens in choppers will CERTAINLY dust your arse

  27. Yeah, that's a bad idea. It's been tried. by Animats · · Score: 2, Informative
    This is an old idea, and it's been tried before. I think it was first tried by Jerry Popek at UCLA in the 1980s, and it was tried in Mach.

    The basic idea is to fake some memory to memory copying operations by using the virtual memory hardware. More specificially, the idea is that when you do a big "write", the space just written becomes read-only to the writing process, rather than being actually copied. When the write is complete, read-only mode is turned off. This eliminates one copy.

    The trouble with this is that when you manipulate the page table to do that, you have to do some cache invalidation. That usually results in cache misses, which outweigh the cost of the copy. So this usually is a lose. Linus points out that it looks good on benchmarks, because benchmarks typically aren't using data for anything and thus don't experience the cache misses.

    Actually, copying is a relatively cheap operation in modern CPUs unless the copy is huge, since most of the work is done in the caches. The mania for "zero copy" complicates systems considerably, makes them less reliable, and, in the end, usually doesn't speed up real work by much.

    Some of this mania comes from Microsoft FUD. At one time, Microsoft was claiming that an "enterprise OS" must be able to serve web pages from inside the kernel. This led to more Linux interest in "zero copy" approaches to be "competitive".

    1. Re:Yeah, that's a bad idea. It's been tried. by n8_f · · Score: 2, Insightful
      I don't know about the rest of your post, but your explanation of CoW is confusing and inaccurate:

      The basic idea is to fake some memory to memory copying operations by using the virtual memory hardware. More specificially, the idea is that when you do a big "write", the space just written becomes read-only to the writing process, rather than being actually copied. When the write is complete, read-only mode is turned off. This eliminates one copy.

      The way CoW works is that when a process copies something already in memory, the kernel has the MMU map those same memory pages to a new location in the process' address space and mark them as read-only, after which the kernel returns the address of the "copied" memory to the process. When any of the processes using that memory try to write to it, the MMU generates an exception (because the pages are marked read-only). The kernel intercepts the exception, allocates additional memory and copies the pages being written to into it, has the MMU remap that process' address space to point to those pages, and then proceeds with the write.

    2. Re:Yeah, that's a bad idea. It's been tried. by Anonymous Coward · · Score: 0

      Virtual Memory Hardware? Words are coming out of your mouth, but I don't think you have the slightest idea what you're talking about.

      It is rather coy how you managed to slam Microsoft and at the same time associate BSD as being influenced by Microsoft group think. Two points for that.

      Have you considered a job in political speech writing?

    3. Re:Yeah, that's a bad idea. It's been tried. by IgnoramusMaximus · · Score: 1

      I dunno about that, to me "zero copy" was always associated with not copying, essentially unchanged all the way, data buffers between various layers of the system, i.e userspace -> libs -> kernel -> ip filters -> drivers -> network hardware, rather then general-purpose memory.

    4. Re:Yeah, that's a bad idea. It's been tried. by mcgroarty · · Score: 2, Informative

      n8_f is correct that the parent post is full of bunk. In addition to what n8_f said, the cost isn't cache coherency, it's the additional copy you end up doing -anyway- if the data does get modified (which is likely when CoW is used for an I/O buffer) on top of messing with the page tables. Messing with the page tables is especially expensive when you need to syncronize this across multiple processors. Given that most new systems have multiple cores, CoW loses even more benefit in any case where the CoW event is likely.

    5. Re:Yeah, that's a bad idea. It's been tried. by Anonymous Coward · · Score: 0

      This is why fork() sucks. It assumes too much about what the programmer wants to do. Linus is complaining about the case where you want an exact clone of the process. What if I'm a 3GB process and I just want to do a system() call? I can't because fork() starts with the assumption that I want to clone my 3GB process.. and there isn't enough memory for that.

    6. Re:Yeah, that's a bad idea. It's been tried. by podperson · · Score: 1

      Actually, copying is a relatively cheap operation in modern CPUs unless the copy is huge, since most of the work is done in the caches. The mania for "zero copy" complicates systems considerably, makes them less reliable, and, in the end, usually doesn't speed up real work by much.

      Let's see -- Linus's argument is that "we mustn't rely on our great hardware for performance and we should code the right way" -- fair enough. Here, you're defending his approach by saying that "even though copying may *seem* inefficient, it isn't, because we have caches (i.e. great hardware) that makes it fast.

      Intuitively speaking, a zero copy approach makes sense. You ask for a duplicate copy of some information that (most likely) you won't want to change, so instead of copying it we point you at the original. If you decide you do actually want to change it, a page fault occurs, and we run off, make a copy, and give that to you instead.

      Now a reliability argument is essentially specious. Every approach we're talking about here is superficially simple and underneath very complex. In one case we're relying on all kinds of cache magic to work properly; cache implementations are really quite complex, but they work reliably. In the other we're relying on all kinds of memory manager magic to work properly ... ditto for them.

      It's really quite disingenuous to label the other side "complex and unreliable" when both sides are complex (and, as far as we can tell, reliable). In the end it's the performance that counts, and here we have an argument that the zero copy approach either "isn't faster in the real world, only benchmarks" -- ok then create a better benchmark OR "is only a little bit faster". Well, a little bit faster is still faster.

    7. Re:Yeah, that's a bad idea. It's been tried. by Animats · · Score: 1
      The trick that I thought was being proposed is this old one:
      • User process makes "write" system call on a page-aligned buffer.
      • Kernel marks buffer pages read-only and locked in memory, but does not copy them.
      • Write operation returns control to user before data has been copied.
      • Kernel does I/O operation directly from user buffer pages.
      • I/O operation completes. Kernel marks buffer pages read/write again.

      This eliminates one copy operation without changing the semantics of "write" as seen by the user.

      This is distinct from copy-on-write when forking, a more common feature.

    8. Re:Yeah, that's a bad idea. It's been tried. by 0xABADC0DA · · Score: 2, Interesting
      Torvalds:
      I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games,

      I totally agree with that, I just go one step further and say that Torvalds is also a total idiot: VM games are bad, so use copying instead because that's less bad. But copying is also bad, so why it at all? Neither are good solutions.

      The problem is that linux and bsd are using "virtual memory" to protect processes from each other, but it is designed to run programs that use more memory than is available. Does it sound right that to protect one process from another you are going to use hundreds of thousands of descriptors for each 4k that all say the same thing? It's pretty stupid actually. 4k is too fine-grained for virtual memory these days as disks have grown. It's both too small and too large for process separation.

      The better solution is to use vm for virtual memory and run all code in the same memory space, but only run code that cannot access memory illegally (ie no pointer arithmetic, only references). This code could be written in Java, or libmo, or D, or maybe other 'safe' languages and run at much faster speeds than they do now as traditional linux processes. The code could be straight C that is JIT recompiled/checked to prevent illegal accesses. That's right, I claim that an average Java program would run faster in such a system than a C program does under a linux/bsd-like system.

      Linus is right, there is massive overhead from doing vm games -- like what is done in linux for instance to separate processes. Did you even wonder why you can't use more than about 80% of the physical memory simultaneously (ie walk an array of 80% physical mem size and see what happens)? That's right, the kernel is using that much as overhead and about 7% of that is page tables for *physical memory*. It takes ~1200 cycles just to enter a system call because of using vm for process separation vs maybe 5 using a single memory space. Unix kernels do not give fine-grained access to anything because it's simply not possible with process separation based on vm to do so, not in practice.
    9. Re:Yeah, that's a bad idea. It's been tried. by 0xABADC0DA · · Score: 1

      Parent poster was correct in this case, Linux is talking about using COW for data passed to system calls like write. Ie so you call write with a 1 gig buffer and instead of copying 1 gig of memory before writing it out to disk/network/etc it just makes the 1 gig read only (or copy on write) and then it can just write it out, in-place.

      So with this syscall glibc could for instance make a fast_write call that has the same api, but it tells the kernel that the memory can be made read-only until the kernel is finished with it.

    10. Re:Yeah, that's a bad idea. It's been tried. by AKAImBatman · · Score: 1

      The better solution is to use vm for virtual memory and run all code in the same memory space, but only run code that cannot access memory illegally (ie no pointer arithmetic, only references). This code could be written in Java, or libmo, or D, or maybe other 'safe' languages and run at much faster speeds than they do now as traditional linux processes.

      Ah, finally someone who understands! I've been promoting this sort of OS design for a long time. The advantages of such a scheme are just incredible, including the ability to align the paging system with actual chunks of memory (e.g. Objects, buffers, etc.) rather than 4K pages that may or may not contain information you actually want to be paged.

      The only problem is that it's not going to happen anytime within the next decade. Far too much software is written in C/C++ to force the issue of "safe" memory handling. :(

    11. Re:Yeah, that's a bad idea. It's been tried. by Anonymous Coward · · Score: 0

      I'll bite.

      >>you are going to use hundreds of thousands of descriptors for each 4k that all say the same thing

      4MB pages have been available since the pentium. For other architectures, page sizes are generally variable powers of two (to a reasonable extent). Typically the overhead (descriptors that say the same thing?) isn't all that nasty even with 4k pages since you don't *have* to map the full 4MB that a complete 2 level page table would need.

      >>I claim that an average Java program would run faster in such a system than a C program does under a linux/bsd-like system.

      So you're taking less of a hit doing JIT than managing separate address spaces? How's that work?

      >>It takes ~1200 cycles just to enter a system call because of using vm for process separation
      How's that? There's no context switch. The kernel is mapped into every user process.

      I sincerely hope you don't believe what you're saying.

    12. Re:Yeah, that's a bad idea. It's been tried. by Jherek+Carnelian · · Score: 1

      n8_f is correct that the parent post is full of bunk.,

      No, the OP is correct, just obtuse and a little sloppy, probably because he knows what he is talking about and didn't bother to explain enough background info for non-experts to easily come up to speed.

      In addition to what n8_f said, the cost isn't cache coherency ...
      Messing with the page tables is especially expensive when you need to syncronize this across multiple processors.


      You are talking about the same thing the OP meant when he said "cache coherency." You both are talking about the TLB which is a cache of page table entries, although not what most people think of when talking about cache.

    13. Re:Yeah, that's a bad idea. It's been tried. by n8_f · · Score: 1
      Ahh, okay, so this is preventing a copy from userspace to the kernel, before the kernel does something like send the data through a network socket? In that case, I don't know how Xnu handles it. Naively, I would think that there would be size threshold below which CoW isn't worth it, but above which it provides performance benefits.

      I think my confusion arose from "the space just written becomes read-only to the writing process." Shouldn't it be "the space just written from becomes read-only to the writing process?" Also, mentioning the kernel/userspace difference is helpful. I didn't read Linus' e-mail exchange until afterwards, but that bit of context was still missing. I should have probably picked it up, but then I'm not really a kernel guy.

      Anyway, thanks for the clarification.

    14. Re:Yeah, that's a bad idea. It's been tried. by 0xABADC0DA · · Score: 1

      >Typically the overhead (descriptors that say the same thing?) isn't all that nasty even with 4k pages since you don't *have* to map the full 4MB that a complete 2 level page table would need.

      All physical memory in the system is in use at all times with a good os, if only for caching data. Thus it has a page table entry. Pretty much all linux distros use 4k pages. So yes, you don't need a page table entry for memory you don't physically have, big deal.

      If you really don't think this is 'not that nasty' explain why an app can't use more than ~80% of the physical memory on a typical linux distro? It's trivial to write a program that keeps growing an array and touching the pages. Once you get near 80% it's dead from swapping even with nothing else running at all.

      >>It takes ~1200 cycles just to enter a system call because of using vm for process separation
      >How's that? There's no context switch. The kernel is mapped into every user process.

      I know this from actually timing it (ie sysenter), of course it varies slightly depending mostly on the processor. Duh. With a single memory space this equates to a function call to a known address, so is basically instant.

      >So you're taking less of a hit doing JIT than managing separate address spaces? How's that work?

      Not all safe code has to be JIT compiled, for example it can be compiled by a trusted compiler as in libmo/dis. Second, JITs work very well. Java raw numerics code is sometimes equal to C speed despite its limitations like no unsigned types, array bounds, no pointer arithmetic, etc. Java heap is significantly faster than malloc/free. Now take off all the time for tlb flushes, interrupts, multiple-copied memory, page table magic, etc in a system like linux. Replace the heap with a vm-optimized garbage collector (using dirty bit, etc) and it gets even faster.

      >I sincerely hope you don't believe what you're saying.

      Yeah and I hope what I'm saying isn't the case, because it's freakin sad a) that uninformed people like you AC exist and b) that we're stuck with this crap we call unix / nt / osx because of inertia.

      Unfortunately it is, and we are.

    15. Re:Yeah, that's a bad idea. It's been tried. by Gorshkov · · Score: 1

      It's really quite disingenuous to label the other side "complex and unreliable" when both sides are complex (and, as far as we can tell, reliable). In the end it's the performance that counts, and here we have an argument that the zero copy approach either "isn't faster in the real world, only benchmarks" -- ok then create a better benchmark OR "is only a little bit faster". Well, a little bit faster is still faster.

      Actually, I think you're being a bit disingenuous here. For COW in this context, t's not bad benchmark OR only a bit faster - it's bad benchmark AND (I would guess) easilly an order of magnitude slower.

      Not quite the same thing.

    16. Re:Yeah, that's a bad idea. It's been tried. by Lord+Crc · · Score: 1
      The better solution is to use vm for virtual memory and run all code in the same memory space, but only run code that cannot access memory illegally (ie no pointer arithmetic, only references). This code could be written in Java, or libmo, or D, or maybe other 'safe' languages and run at much faster speeds than they do now as traditional linux processes.

      Sounds a bit like Singularity, although they state that the primary motivation behind Singularity is dependence, not performance. It has some interesting features that from what I can see eliminates a lot of copying. Taken from an overview document:
      "The Exchange Heap, which underlies efficient communication in Singularity, holds data passed between processes. ... A process accesses a [memory] region through a structure called an allocation. ... More than one allocation may share read-only access to an underlying [memory] region. Moreover, the allocations can have different base and bounds, which provide distinct views into the underlying data. For example, protocol processing code in a network stack can strip the encapsulated protocol headers off a packet without copying it."


  28. Solaris... by Anonymous Coward · · Score: 1, Insightful

    Solaris 10/11 and openSolaris are looking better each and everyday

    1. Re:Solaris... by Anonymous Coward · · Score: 0

      Yeah right, Solaris is nothing but trouble. Grass is always greener huh?

      Go ahead, try Solaris for a while. Maybe buy a Sun machine while you're at it. It's an expensive way to learn but you'll be back on Linux/BSD/PC hardware in no time.

    2. Re:Solaris... by pooly7 · · Score: 1

      Then how do you explain that Euronext, and its platform liffe.connect (who runs e-CboT, Tiffe and all derivatives products of Euronext) is switching from Sun Solaris to Linux ? http://www.wallstreetandtech.com/showArticle.jhtml ?articleID=185301308

  29. Linus, my advice to you is simple... by Anonymous Coward · · Score: 4, Funny

    .. this will help you keep yourself calm.

    1. Re:Linus, my advice to you is simple... by Anonymous Coward · · Score: 0

      More calming links for Linus (or LFL) are located here

  30. gosh by koekepeer · · Score: 1

    the oracle has spoken and we are struck with awe...

    linus is just a guy, quite opinionated, and quite harsh in his words at times. are we gonna see this kind of slashdot story everytime he misbehaves somewhat? (i hope not)

    1. Re:gosh by vux984 · · Score: 1

      linus is just a guy, quite opinionated, and quite harsh in his words at times. are we gonna see this kind of slashdot story everytime he misbehaves somewhat?

      Yes.

      Linus is to slashdot what, say, Tom Cruise is to tabloids.

      So, yeah, just as everytime Cruise sleeps with someone, has dinner with someone, or suggests he might eat a placenta gets headlines in the rags, slashdot dutifully follows its "celebrities" around to make much ado about the minutia that go on in their lives.

  31. lovely community you have there by Anonymous Coward · · Score: 0

    ah, the joys of a friendly, community driven operating system. I'm 100% sure they'll never, ever become all spiteful over choices that other people want to make in software design.

    I think torvalds should spend more time hanging out with steve ballmer, and learn how to properly go ape shit on people.

  32. Just what we need..... by devphaeton · · Score: 1

    ....more rivalry between the BSD folks and the Linux folks. Using phrases like "incompetent idiots" lifts this out of 'friendly sibling rivalry' towards Holy War territory.

    I dig Linus, he's a smart, capable and funny guy, but this kinda dissapointed me. He's better than that.

    --


    do() || do_not(); // try();
    1. Re:Just what we need..... by Anonymous Coward · · Score: 0

      Maybe Linus is looking to give Theo de Raadt a run for the title.

    2. Re:Just what we need..... by Homology · · Score: 1
      ....more rivalry between the BSD folks and the Linux folks. Using phrases like "incompetent idiots" lifts this out of 'friendly sibling rivalry' towards Holy War territory.

      The *BSD in general think that most Linux developers does not care about quality, only about "performance". The constant stream of Linux kernel root exploits is one example of lack of quality.

    3. Re:Just what we need..... by Anonymous Coward · · Score: 0

      I dig Linus, he's a smart, capable and funny guy, but this kinda dissapointed me. He's better than that.

      He's a dick and he's always been a dick. Being a good developer doesn't make him any "better" than a dick.

      I know plenty of good developers who are happy to explain in detail why they disagree with you by listing the pros and cons of different solutions and explaining why their solution is most fit to meet the project goals. They may not have written linux, but if I were writing a new operating system I'd choose to work with them over Linux in a heartbeat.

  33. Yes, it's simple by Anonymous Coward · · Score: 0

    We have a saying in Dutch that goes 'what doesn't come out of the length, must come out of the width', meaning, in this case, that when you want increased performance, you should stop playing games with VM page tables. This seems obvious, but it is really only obvious in an era when memory is cheap relative to CPU power. And of course the reverse is also true; when you want to save memory, you will deliver on performance and you can start playing games with VM tables, shared libraries, and zipped memory to boot if you like, but the saying never stops being true; memory and performance are on opposite sides of a scale.

  34. Geez Linus... by sardaukar_siet · · Score: 1

    ... lighten up, will 'ya? We all know you're a genius that wrote his own OS at age 19, can't work with Gnome because you're such a power user that Gnome is "for idiots" and NOW you shoot yourself in the foot with arguments not fully covered in all angles and using harsh language towards developers that have crafted something over the years, *democratically* (asbestos suit on), that is stable and with a very altruistic licensing policy (1. don't claim it's yours ; 2. don't blame us if it breaks something ... basically).
    Maybe some redmeat-reduction on your diet will help keep your agression levels down a notch. :)

    1. Re:Geez Linus... by PenGun · · Score: 0, Flamebait

      Gnome and KDE _are_ for idiots. Why do you need all that useless cruft just to manage a windowing scheme?

          PenGun
        Do Wnat Now ??? ... Standards and Practices !

  35. Bitter much? by krewemaynard · · Score: 1

    Pot, meet kettle...

    --
    I saw it on Slashdot, it must be true!
  36. AMEN, MOD PARENT UP by Demerol · · Score: 0, Troll

    Mod that up and spin the meat!!!

  37. Sweet by bogie · · Score: 4, Funny

    It's been a while since we had a huge linux vs BSD flame feast.

    I'll start.

    BSD user: Linux is a confusing mess of programs and is less stable than BSD.

    Linux user: Your still here? I thought you were dead by now?

    --
    If you wanna get rich, you know that payback is a bitch
    1. Re:Sweet by jonnythan · · Score: 1

      My sig is a little volley all in itself ;)

    2. Re:Sweet by podperson · · Score: 1

      Your still here?

      So now you're accussing Linux advocates of being illiterate?

    3. Re:Sweet by Anonymous Coward · · Score: 0

      No, we're just observing that you misspelled "accusing."

    4. Re:Sweet by TCM · · Score: 2, Interesting

      Windows: Where do you want to go today?
      Linux: Where do you want to be tomorrow?
      BSD: Are you guys coming or what?!

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    5. Re:Sweet by C_Kode · · Score: 1

      I see it more as...

      Linux: Forward thinking people. (ie I want tomorrow better than today)
      BSD: If it ain't broken, don't fix it. (ie. Lisp worked great yesterday, and it works great today too so F your Ruby/Python/{language of choice})
      Windows: Just like Grandma used to make. (ie people don't like change)

  38. That's what the calls were... by shotfeel · · Score: 1

    You missed the key factor -they are hang-up calls from someone playing a joke.

    He didn't say they were legitimate messages from real people who actually wanted to talk to him.

  39. Re:I call bullshit! - You got me! by digitaldc · · Score: 1

    thier probably your mom calling to ask you when your coming out of the basement, your friend inviting you to stand in line for tickets to the latest Sci-Fi flick and the Pizza guy confirming your order of 2 large and a 2 liter of Mt. Dew on a Friday night

    You were almost right, but it was my abusive step-dad to tell me to come out of my attic room and clean up his beer cans, my online friend from Japan inviting me to come to Tokyo for a Manga festival, and the Thai delivery service confirming my order of Green Curry, Tea and Tom Yum Gai on a Saturday night.

    --
    He who knows best knows how little he knows. - Thomas Jefferson
  40. Fess up by DaveV1.0 · · Score: 1

    Who pissed in Linus' Wheaties?

    Or did someone dress his plush tux up with an "I love Windows" sign and BSD daemon horns?

    --
    There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
    1. Re:Fess up by funkyhat · · Score: 1

      Guilty as charged... ;)

    2. Re:Fess up by Anonymous Coward · · Score: 0

      Maybe someone pissed in your wheaties because we all know you love that FAG!!!

  41. Re:Maybe Linus needs a vacation? by shotfeel · · Score: 1

    I was wondering if you were projecting when saying that Linus needed to take a break....

  42. snob by Anonymous Coward · · Score: 0

    pfft...another linux snob

  43. Linus sometimes calls people idiots by cpu_fusion · · Score: 4, Insightful

    And in other news...
    Grass is green;
    Oil is overpriced;
    Absolute power corrupts absolutely.

    1. Re:Linus sometimes calls people idiots by zoloto · · Score: 1

      And it absolutely rocks too!

    2. Re:Linus sometimes calls people idiots by corblix · · Score: 1
      Oil is overpriced;

      Yeah, yeah, you're being funny. But seriously, many of us think oil is underpriced.

      Polution is an externality for both producers and consumers of oil. If that were changed so that people were reponsible for fixing (or paying for fixing) the damage they caused, then the price of oil would probably be significantly higher than it is now.

    3. Re:Linus sometimes calls people idiots by BluedemonX · · Score: 0

      RE: Yeah, yeah, you're being funny. But seriously, many of us think oil is underpriced.

      Oh, hey, thanks for intruding on the discussion. Whereabouts in Iran are you from?

      --

      --- Jump!! Fire!! Bullet time!! - Lego version of the Matrix
  44. OMG Linus is pissed!!!! by endrue · · Score: 1

    Stop the presses!!!!!!!
    Seroiusly folks, is this really that exciting?

    - Andrew

    --
    I meta-moderate because I care.
  45. He's just a developer by krewemaynard · · Score: 3, Insightful

    Here we go again, imposing "role model" status. Linus is just a guy. Sometimes he gets his buttons pushed, sometimes he's doing the pushing. BFD. Maybe you'd be a little pissy too if Slashdot posted a story every time you did or said something. Linus Prefers Gas-X, Says Bean-o Is For Douchebags. Who cares? (BTW, Linus didn't really say that, I made it up. Don't wanna get the Bean-o people on his case too.)

    As far as this whole VM thing goes, time and testing will tell the true story. Meanwhile, maybe we could try NOT deifying Linus (any more)?

    --
    I saw it on Slashdot, it must be true!
  46. GEEK WARS BEGIN!!! NEWS AT 11!! by Electric+Eye · · Score: 3, Funny

    Our top story tonight, uber geek Linus Torvalds unleashed a scathing indictment of some other geeks, claiming they are skating on thin ice by using Virtual Memory calls to improve performance. The words sparked outrage in the dark rooms of colleeg geek programmers from Berkley to Berlin. The angry geek mobs said they're going to launch a flame war from their computers "to teach Linus a lesson."

    In the words of George Takei "Hoooooooly geeeez!" This is news??

    1. Re:GEEK WARS BEGIN!!! NEWS AT 11!! by maxume · · Score: 1

      Begun, this nerd war has.
      Spastic, it will be.

      --
      Nerd rage is the funniest rage.
    2. Re:GEEK WARS BEGIN!!! NEWS AT 11!! by Omnifarious · · Score: 1

      I think it is, but mostly because it's a really interesting insight into a problem I care about. This is Slashdot, News for Nerds after all. :-)

  47. More Details by sabat · · Score: 1, Troll

    IT IS OFFICIAL; WIRED NEWS CONFIRMS: LINUX IS SUPERIOR TO *BSD
    *BSD is Dying, Says Respected Journal

    Linux advocates have long insisted that open-source development results in better and more secure software. Now they have statistics to back up their claims.

    According to a four-year analysis of the 5.7 million lines of Linux source code conducted by five Stanford University computer science researchers, the Linux kernel programming code is better and more secure than the programming code of FreeBSD.

    The report, set to be released on Tuesday, states that the 2.6 Linux production kernel, shipped with software from Red Hat, Novell and other major Linux software vendors, contains 985 bugs in 5.7 million lines of code, well below the average for FreeBSD software. FreeBSD, by comparison, contains about 40 million lines of code, with new bugs found on a frequent basis.

    FreeBSD software typically has 20 to 30 bugs for every 1,000 lines of code, according to Carnegie Mellon University's CyLab Sustainable Computing Consortium. This would be equivalent to 114,000 to 171,000 bugs in 5.7 million lines of code.

    The study identified 0.17 bugs per 1,000 lines of code in the Linux kernel. Of the 985 bugs identified, 627 were in critical parts of the kernel. Another 569 could cause a system crash, 100 were security holes, and 33 of the bugs could result in less-than-optimal system performance.

    Seth Hallem, CEO of Coverity, a provider of source-code analysis, noted that the majority of the bugs documented in the study have already been fixed by members of the Linux development community.

    "Our findings show that Linux contains an extremely low defect rate and is evidence of the strong security of Linux," said Hallem. "Many security holes in software are the result of software bugs that can be eliminated with good programming processes."

    The Linux source-code analysis project started in 2000 at the Stanford University Computer Science Research Center as part of a large research initiative to improve core software engineering processes in the software industry.

    The initiative now continues at Coverity, a software engineering startup that now employs the five researchers who conducted the study. Coverity said it intends to start providing Linux bug analysis reports on a regular basis and will make a summary of the results freely available to the Linux development community.

    "This is a benefit to the Linux development community, and we appreciate Coverity's efforts to help us improve the security and stability of Linux," said Andrew Morton, lead Linux kernel maintainer. Morton said developers have already addressed the top-priority bugs uncovered in the study.

    --
    I, for one, welcome our new Antichrist overlord.
    1. Re:More Details by Anonymous Coward · · Score: 0
      Informative?

      more like 0 Offtopic.

      http://scan.coverity.com/ concludes:
      FreeBSD: 626
      Linux: 628
      Either way, I'd be more scared of running:
      X.org: 713
      OpenOffice.org: 1024
    2. Re:More Details by Anonymous Coward · · Score: 0

      I read that article I didn't see any mention of FreeBSD. http://www.wired.com/news/linux/0,1411,66022,00.ht ml If you were trying to be satirical well you failed.

    3. Re:More Details by Homestar+Breadmaker · · Score: 1

      You'll notice that linux is just a kernel, and freebsd is a kernel, plus the entire set of system utilities and libraries. So yeah, those numbers do suggest that linux is lower quality.

    4. Re:More Details by Gorshkov · · Score: 1

      IT IS OFFICIAL; MIGRATION OF CANADA GEESE IS A MYTH

      Readers and editors at Bird Watcher's Digest have been insisting for years the the Canada Goose migrates south for the winter, and north for the summer.

      According to an exhaustive statistical analysis conducted at the University of Montreal and Florida State University, this has now been shown to not be the case

      The study, to be released on Tuesday jointly by Agriculture Canada and the US Wildlife service, show that the real reason for the seasonal movements of the Geese is simple and deceptivly straight-forward

      It seems they're just following the old folks that feed them.

  48. Just say what you mean by lymond01 · · Score: 3, Insightful

    "I claim that Mach people (and apparently FreeBSD) are incompetent idiots."

    Linus, who's becoming more outspoken as he ages, needs to find that line between anonymous forum geek and software spokesperson...and then not cross it. Calling anyone an incompetent idiot is both non-constructive if you're hoping to improve a situation, and just plain unfriendly in an area where cooperation amongst developers is so crucial (open source).

    1. Re:Just say what you mean by Pantero+Blanco · · Score: 1

      Especially since, in the area everyone involved is in, there aren't many (if any) "incompetent idiots". "Incompetent idiots" couldn't put together an OS on anything close to the level of *BSD _or_ Linux.

      It's like one world-class athlete who just came out close in a competition with another calling the other a "no-talent weakling".

    2. Re:Just say what you mean by Arandir · · Score: 1

      Mod parent up! There is no excuse for Torvald's rude boorish behavior. To run with your analogy, Linus isn't acting like a world-class athlete, he's acting like a WWF goon showing off on camera.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
  49. Idiocy by Piroca · · Score: 1


    I claim that Mach people (and apparently FreeBSD) are incompetent idiots

    From where you cab conclude who's actually an idiot.

    1. Re:Idiocy by Anonymous Coward · · Score: 0
      From where you cab conclude who's actually an idiot.

      Does parent mean self?

      Or do you hab a code id your doze?

  50. Obligatory Simpsons by Anonymous Coward · · Score: 0

    Don't have a COW, man

  51. Obligatory Simpsons reference by Admiral+Burrito · · Score: 2, Funny

    Linus Torvalds: "Don't have a COW, man!"

  52. The Universe In Which Spock Has A Beard? by tlambert · · Score: 4, Funny

    ``And in what universe is anyone who can intelligently speak about (much less code around) memory and VM management [be called] an "incompetent idiot"?''

    The Universe In Which Spock Has A Beard?

    -- Terry

  53. This coming from a man who had by Anonymous Coward · · Score: 0

    how many stable releases lately?

  54. Harsh words? by Anonymous Coward · · Score: 0

    If you think that were harsh words than you should thank god that he doesn't use autoconf.

    Just imagine...

  55. Paging Theo... Theo de Raadt please pickup the... by Temkin · · Score: 1



    I'm somewhat disappointed in Linus' tone. Having said that... This just begs a response from Theo...

    Time to go make some popcorn. :)

  56. Wasn't there an article a few days ago? by Anonymous Coward · · Score: 0

    Wasn't there an article a couple days ago about the snobbery and snubbery experienced by newbies to Linux when dealing with the 'community'? When the FOUNDER of Linux would talk that way about fellow Linux Pros then I guess the attitude trickle down is not so surprising. 433t 4 41FE!

    -AC

  57. The Real issue here by brendanoconnor · · Score: 0, Troll

    I think we are missing the real issue here. BSD is dead, so really for Linus to even bring it up is to beat a dead horse.

  58. The best thing about being a professional . by Shohat · · Score: 3, Interesting

    Linus is a gifted engineer ,let him be rude . Aside from Linus being rude , there is no actual story here .
    I used to own restourant and also an Office supplies shop . It was quite interesting and made me some money , but I hated the fact that the most important factor in my life was pleasing(customers) or fighting(suppliers) other people . I had to constantly think what to say and how to behave .
    I am no longer a business owner , and now I work with a rather gifted bunch of engineers , and frankly it gives me great pleasure to know that neither I nor the people I work with dont really care about being polite , clean shaven well spoken or good looking . I can be rude if I want to , they can be rude if they want to , and we all get along very well .

    1. Re:The best thing about being a professional . by Jayzz · · Score: 1
      Linus is a gifted engineer ,let him be rude .

      And we wonder why geeks don't have many friends...
    2. Re:The best thing about being a professional . by Anonymous Coward · · Score: 0

      Great. More people thinking that it's okay have the social skills of a retarded yeti, making the rest of us look bad. Thanks, jackass.

  59. good approach by ArbitraryConstant · · Score: 1

    In practice I think the FreeBSD approach probably does have speed advantages in most cases, and the fact that it's transparent to the userspace developer would seemingly be a big advantage.

    But Linus makes a good case, and I'm glad to see him taking a more conservative approach after the troubles with getting some decent stability out of 2.6. write() and sendmsg() aren't that slow, and the new way will be faster and less fragile. It provides opportunities for optimization for those willing to do platform specific stuff, and it provides a reasonably fast portable way for everyone else. This is nothing new, there's plenty of platform-specific system calls on Linux, like epoll(), and the BSDs do it too with stuff like kqueue().

    This sort of thing matters almost exclusively to people doing really deep performance tuning, and for them it's better to present a simple API with large rewards for tuning, instead of transparently doing something weird to an existing API that will break in the field without you noticing and requires really weird usage to get the best performance.

    --
    I rarely criticize things I don't care about.
    1. Re:good approach by mrsbrisby · · Score: 3, Informative

      In practice I think the FreeBSD approach probably does have speed advantages in most cases, and the fact that it's transparent to the userspace developer would seemingly be a big advantage.

      No, it has a speed advantage over read()/write() provided you are aware of exactly how it works. The fact that it's transparent to the userspace is a bad thing because it means you have code written a certain way- that nobody will ever understand why.

      Reusing the pages causes the speed benefit to go away- and in fact it'll be slower than read()/write().

      This sort of thing matters almost exclusively to people doing really deep performance tuning, and for them it's better to present a simple API with large rewards for tuning, instead of transparently doing something weird to an existing API that will break in the field without you noticing and requires really weird usage to get the best performance.

      I agree completely. Unfortunately, the FreeBSD API is inadequate. It's not faster in practice unless you do something really really weird (waste memory). The big difference is the Linux implementation gives explicit notification and the FreeBSD API doesn't.

      FreeBSD doesn't provide an API to ask if the pages are still in use. That'd probably make their approach usable- but at that point, why bother updating the page tables at that point?

      Once you're there, why bother statpage() to check to see if the page is in use? Why not have the kernel send the pages that are available via a file descriptor so you can poll() or select() on it?

      At this point, you're at the Linux implementation.

      That's it. That's why it's better.

    2. Re:good approach by Gorshkov · · Score: 1

      This sort of thing matters almost exclusively to people doing really deep performance tuning, and for them it's better to present a simple API with large rewards for tuning, instead of transparently doing something weird to an existing API that will break in the field without you noticing and requires really weird usage to get the best performance.

      That sort of depends on what you mean by "deep performance tuning"

      I'm a typical programmer .... lazy as hell.

      Right now, I'm in the middle of developing a network application - think of it as an on-line jukebox. I have streming audio out (multiple destinations), chat messages that have to be distributed to other users in the system, etc.

      It's not a particularly high-performace type system ... but because of the VOLUMNE of data I have to push out, a "little tweak" like this could make the difference between me being able to host it on my current server (Athelon 2800+, 2 gig memory) and having to shell out a few K for a multi-cpu, shit-load-of-memory system.

      It's a hobby system right now, and will *probably* wind up staying that way - with a few tends of users, who cares?

      But that one simple API notification, and the potential performance improvements, could mean the differnece between me being able to host 10/20/25 people on my system and being able to host 100/200/250 people on my system. With the amount of I/O I'm doing (and the application is 90% I/O driven), that's significant when you try to scale up.

      ANd all I'd have to do to take advantage of it is put a simple wrapper around the malloc() calls where I allocate my network buffers, and my read/write routines.

      Not exactly what I'd call deep performance tuning.

    3. Re:good approach by pthisis · · Score: 1

      In practice I think the FreeBSD approach probably does have speed advantages in most cases, and the fact that it's transparent to the userspace developer would seemingly be a big advantage.

      Why would you say this? In practice, on every real OS that's tried it, real-world apps become much slower with the COW approach when zero-copy is enabled than if they use the old read/write semantics. And in practice, even the guy who proposed it on the Linux kernel mailing list said (in response to "That's a huge mistake, and anybody that does it that way (FreeBSD) is totally incompetent."):
      Yea, we're not using it either.

      It's a pain in the ass and even if you get it right performance gains are minimal. Because in the real world, people don't allocate new memory for every read/write pair, they use a static buffer. Which means they get the COW fault on every read, unless they do a fairly arcane restructuring of their buffer use.

      And without that, it is much slower than just doing a real read/write pair and forgetting zero-copy.

      Linus was overly harsh in his commentary, but ultimately the response to What about marking the pages Read-Only while it's being used by the kernel and if the user tries to write into them letting the VM dup the page with the COW code?

      should be something like Dave Miller's one-liner:
      That's historically how you kill performance.

      It's been done a million times, not once with success, and so unless someone can actually produce a system that uses it and performs well in real (end-application) benchmarks and not contrived "here's how it looks if I never take a fault" benchmarks then all the evidence supports using an explicit (splice/tee) approach.

      And, in reality, coding applications to use splice/tee explicitly is easier on the end programmer than coding them to use a COW zero-copy read/write efficiently (ie without incurring so many COW faults that it's actually slower than not using a zero-copy approach).

      So you can either do a COW approach, put a lot more burden on the programmer to get passable efficiency, and even then have it only be a moderate gain.

      Or you can do an explicit splice approach, have it be easier on the programmer, and have it perform much better.

      It's a no-brainer.

      --
      rage, rage against the dying of the light
  60. MOD PARENT WAY UP!!! by Anonymous Coward · · Score: 0

    Mod grandparent down!

    Captcha word tiresome. How appropriate.

  61. Because Linus makes GREAT decisions... by Anonymous Coward · · Score: 0

    ...like Transmeta. Good one, Linus.

  62. I'm more concerned about the headline... by enitime · · Score: 2
    I mean what has the world come to when the submitter fails to make fun of an acronym like COW?

    Off the top of my head:

    Linux to BSD: "Don't have a COW, Man!"
    Linus UnMOOved by COW.
    Penguin: Demon COW Dog.

    1. Re:I'm more concerned about the headline... by Anonymous Coward · · Score: 0

      You stole my job! But you gave me a laugh, so what the heck. Mod up!

  63. Just imagine... by Anonymous+Writer · · Score: 1

    ... Linus, Ballmer, and Jobs locked in a room, playing musical chairs. Someone's going to come out with a concussion.

    1. Re:Just imagine... by RangerRick98 · · Score: 1

      Linus, Ballmer, and Jobs locked in a room, playing musical chairs. Someone's going to come out with a concussion.


      Dude, what are you thinking?!? You're going to lock Ballmer in a room with chairs???
      --
      "You're older than you've ever been, and now you're even older."
    2. Re:Just imagine... by Lord+Bitman · · Score: 1

      Well, you almost got it... maybe that's worth half credit?

      --
      -- 'The' Lord and Master Bitman On High, Master Of All
    3. Re:Just imagine... by Anonymous+Writer · · Score: 1

      You're going to lock Ballmer in a room with chairs???

      And thus the concussion.

  64. Linus taking notes from Theo? by rhavenn · · Score: 1

    Has Linus been taking notes from Theo?

    1. Re:Linus taking notes from Theo? by Miniluv · · Score: 1

      Amusingly, Theo today posted to bugtraq and made a point of pointing out how nice and polite he was being (and he was!).

    2. Re:Linus taking notes from Theo? by whitehatlurker · · Score: 1
      Not quite polite, but I do agree with Theo:

      "I will try to be as nice as anyone has ever seen me be:
      PAM is completely and utterly broken and cannot be fixed.
      "

      It is as polite as anyone has seen him being.

      --
      .. paranoid crackpot leftover from the days of Amiga.
  65. Lesson on Tact.. by corellon13 · · Score: 3, Insightful

    Linus, you may be right and you may be very smart, but you should try a little tact. Here's a good definition for it that I learned from a drill sergeant: "Tact is the ability to tell someone to go to hell and look forward to the trip."

    Being nice and respectful doesn't mean you can't tell it like it is.

    --
    Do what is right and let the consequence follow
    1. Re:Lesson on Tact.. by Anonymous Coward · · Score: 0

      "have them look forward to the trip".

  66. Linus can kiss my arse by Anonymous Coward · · Score: 0

    People who live in glass houses should not throw stones

  67. Fork by Schraegstrichpunkt · · Score: 1

    I predict a major fork() of Linux within 5 years.

    1. Re:Fork by Anonymous Coward · · Score: 0

      How about one where getting sound and accelerated graphics to work isn't a fucking chore that takes 5 hours of leafing through thousands of Google search results to figure out? God forbid more than 1 application running wants to use sound at the same time!! We bash Microsoft all the time but we never make an effort to make our OS as easy to use as Windows, do we?

      Yeah yeah, I know, anything suggesting that Linux is difficult to use, or that it's hard to get it to work with certain hardware combinations is obviously a "Troll" right?

    2. Re:Fork by C_Kode · · Score: 1
      Dude, forking is the old model. Today we use threads. ;)
      if ( kernel->isDick(kernelMaster) == true ) {
          myNonDickKernel = noDickThread.newThread(kernel);
      }
    3. Re:Fork by belg4mit · · Score: 1

      It's a troll because this about the kernel not the platform ("GNU/Linux") you tool.

      --
      Were that I say, pancakes?
    4. Re:Fork by Schraegstrichpunkt · · Score: 1

      You don't need the "== true" part. :-P

    5. Re:Fork by Schraegstrichpunkt · · Score: 1

      Nearly all hardware that is properly documented works fine out of the box. Like most of the bluescreens in the latest versions of Windows, it's the damn hardware manufacturers' fault!

  68. Linus is growing old by imbaczek · · Score: 0, Offtopic

    As much as I like Linus, he's starting to be an old fart whose only excuse for throwing insults around seems to be the fact that he's right.

    I hope *I'm* wrong on this one ^^

    1. Re:Linus is growing old by Miniluv · · Score: 0, Offtopic

      What other reason does he need? Are you trying to say that it'd be ok if he was PMS'ing and wrong, since at least PMS is a good reason to insult someone?

  69. hell yes by Anonymous Coward · · Score: 0

    Woohee, that's telling them sumbitches about sumthing!

    Or whatever. Er, which one represents Micro$oft?

  70. NetBSD has "page loaning"... it's better. by Anonymous Coward · · Score: 1, Informative

    http://netbsd.org/Documentation/kernel/uvm.html

    http://www.netbsd.org/Releases/formal-1.6/NetBSD-1 .6.html

    Zero copy by avoiding *both* the FreeBSD copy on write, AND the Linux vmsplice().

    Instead, one piece of code "loans" the pages to another. It disappears from the first ones
    address space (or is marked r/o), and appears in the seconds address space. When the second one is done with it, it hands the pages back.

    This avoids *all* copies, include the one that Linux still has. The only cost is that
    the original user can't write to the pages while the other one is accessing the pages.

    But see the release notes on "page loaning". This is true *zero* copy for pipes and
    tcp/udp data. No copies. Ever.

  71. Flame war article by Anonymous Coward · · Score: 1, Insightful

    Why not focus on the real content instead of write tabloid style articles.

  72. He has the stones to back it up by Gothmolly · · Score: 3, Insightful

    Linus has frequently called people idiots, and ignored patches, and done stuff his own way for a very long time now. He's quite successful at it. Perhaps what most people need to realize is that he is that good, that he can. The average read-Slashdot-during-work-while-coding Slashdotter is not in his league, so decrying his adhominem attacks, or "I would do X instead" arguments just dont hold much water.

    --
    I want to delete my account but Slashdot doesn't allow it.
    1. Re:He has the stones to back it up by nblender · · Score: 1
      Is that kind of like when Linus accused the NetBSD folks of being idiots for insisting on sync'ing errno values across platforms? I think *BSD has been ignoring his little tantrums since then.

    2. Re:He has the stones to back it up by Gothmolly · · Score: 1

      I bet you googled feverishly for another "Linus tells *BSD to die in a fire" quote, just so you could seem insightful.

      But you don't address the most recent incident. So your post is irrelevant.

      --
      I want to delete my account but Slashdot doesn't allow it.
    3. Re:He has the stones to back it up by kronocide · · Score: 1

      The average read-Slashdot-during-work-while-coding Slashdotter is not in his league, so decrying his adhominem attacks, or "I would do X instead" arguments just dont hold much water.

      Except the thing with ad hominem attacks is exactly that if you make them you are wrong regardless of whatever else you may be right about. It's a bit like shooting yourself in the head, rhetorically speaking.

    4. Re:He has the stones to back it up by nblender · · Score: 1

      Then I would have posted the link now, wouldn't I? I remember it because I was involved in it.

  73. You stupid COW! by codehead78 · · Score: 2, Funny

    You'll scream! I'll vmsplice ya, it's gonna hurt.

  74. Re: Torvalds Has Harsh Words For FreeBSD Devs by jefp · · Score: 0, Flamebait

    >Mach people (and apparently FreeBSD) are incompetent idiots.

    No *you're* a towel.

  75. Be specific by microbee · · Score: 1

    Linus is not saying COW is bad. He is saying in this specific case it's not necessarily a gain. In fact, he is right that if your "COW hit ratio" is now low enough, it's a loss due to the setup and fault handling cost. In process fork time COW is a win because the ratio is low enough.

  76. Wow by Anonymous Coward · · Score: 0

    An article from Kernel Trap, who probably didn't pay to get it posted.

    Smacks of the old days of /., before it turned into a chamber pot for the Big Names.

    More of this sort of article, please.

  77. They've been saying it because it's a valid point by Nugget · · Score: 1

    It's a fair point to comment that the evaluation of server stability compared a stable release of Linux against a development branch of FreeBSD that was not intended for production systems. One might make the case that it was fraudulent to compare the versions that they did, since it's unreasonable to expect stability or scalability out of a development branch that was undergoing significant overhaul and change at the time.

    it was not until FreeBSD 5.21 that FreeBSD 5 was recommended for production systems.

  78. more like spork by dino213b · · Score: 1

    spork = fatter

    I can appreciate Linus' devotion to lean code. I have seen the results. My workstation (at work, no less, duh) in 1999 was so incredibly slow and my company was bitten in the pockets by the ram price surges that Windows would barely run on it. The company that I worked for used a web-based system, so a gui web browser would be vital to my needs. Without waiting for a miracle (that never came) I fixed the problem by installing linux on the machine.

    The difference in performance between windows and linux on that machine was night and day. Even with only 32 megs of ram, the machine was responsive enough for me to be able to do much more than surf the company's internal web management site. Linux worked where windows 95, 98, nt failed miserably. I ran xfree86 on it, latest version of Mozilla at the time, gaim (was gaim even around back then? surely so), and psdoom (everyone must have it!). Windows versions of these programs would drain so much memory, it wasn't funny.

    Needless to say, I can appreciate any angry rants revolving around computer performance. Larger the operating system and the apps get, every bit counts.

    1. Re:more like spork by Arandir · · Score: 1

      You're arguing that Linux has more performance than Windows. Pretty much everyone in the known universe agrees with tha. However, the performance differences between Linux and FreeBSD are insignificant. Artificial antiseptic benchmarks might be able to tell the difference, but real world usage can't.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    2. Re:more like spork by dino213b · · Score: 1

      Hm, no, that is not my argument at all. My argument was that efficiency is important, at least to some users who don't feel like upgrading hardware senselessly. Deeper you get into electrical engineering, more invisible the dividing line between hardware and software gets. I merely invoked the windows vs linux performance issue as proof that efficiency is important.

      However, you are right to read that my point was unclear -- just as it is unclear how the master point will affect us all in the long term. Further studies are needed, and I made an early biased decision to side with Linus for some odd reason. Do you think Linus tried a similar model in the past and saw disastrous results? Without speculating on those possibilities, his response is very fiery and I bought it.

      Time will tell how this issue develops. Thanks for reminding me that I, like many others, polarize on beliefs instead of tested facts.

      -cheers!

  79. Linus claims Slashdotters are idiots by microbee · · Score: 1

    From a recent LKML email: I got slashdotted! Yay! On Thu, 20 Apr 2006, Linus Torvalds wrote: > > I claim that Mach people (and apparently FreeBSD) are incompetent idiots. I also claim that Slashdot people usually are smelly and eat their boogers, and have an IQ slightly lower than my daughters pet hamster (that's "hamster" without a "p", btw, for any slashdot posters out there. Try to follow me, ok?). Furthermore, I claim that anybody that hasn't noticed by now that I'm an opinionated bastard, and that "impolite" is my middle name, is lacking a few clues. Finally, it's clear that I'm not only the smartest person around, I'm also incredibly good-looking, and that my infallible charm is also second only to my becoming modesty. So there. Just to clarify. Linus "bow down before me, you scum" Torvalds

  80. Funny... by horovitz · · Score: 1

    My views: A page fault's expense seems to vary quite a bit among architectures/processor and also the entire system setup and assumptions on VM system. x86 architectures aren't too good at it and Linux is quite x86 minded. Sure, there aren't many other relevant architectures these days. Also, these days, VM systems aren't used today as they were back when they have been designed. Back in 1988 or 1990 (or before), a typical Unix/BSD system would often page all the time - RAM was never enough - and page in small memory portions. Today, RAM is also not always sufficient, but if not - usually one or some large process(es) gets paged out - usually doing something related with media files or other interactive stuff. For me, COW was a quite efficient overall aproach. Maybe there are some more efficient but less general aproaches around, and maybe a general purpose VM strategy would be different if designed today having all the current hardware setups in mind (and not 20 years ago). This doesn't make someone else an idiot. h.

  81. But, but... by dtfinch · · Score: 0

    Copy On Write sounds so cool.

  82. How is this different than AIO? by pammon · · Score: 2, Informative
    Can someone please explain to me how this new proposal is different from the aio_* functions (asynchronous I/O) that appeared in FreeBSD?

    For example, aio_write() writes to the file descriptor, allows you to poll for success a la select, and tells you not to modify the buffer before it's done (but doesn't try to stop you with copy-on-write).

    This sounds exactly like what Linus wants.

  83. fools by prurientknave · · Score: 1

    I hate how all the fools have signed on solely to criticize the language in a conversation between linus and some freebsd devs. If all the uninvolved and clueless fanboys would just stay out of it, we'd have a more useful discussion on this topic.

  84. Re:Theo is turning into a dictator by CyberNigma · · Score: 1

    During that first sentence there I thought you were talking about Theo de Raadt.

  85. The merit of COW ? by Serilkath_Montreal · · Score: 0

    The merit of COWs is milk and the occasional fucking by the farmer, that's all.
    May be I should have posted this one anonymously...

    PS : BTW FreeBSD's better than Linux. (Duck).

    rePS : I definitively should have posted anonymously.

    --
    malheureusement la stupidité n'est ni curable, ni mortelle.
  86. Funny by dnaumov · · Score: 1, Insightful
    "Linus: Playing games with VM is bad."

    Funny you should say that Linus, seeing how much of a fucking disaster was changing the VM in the middle of the 2.4 kernel branch that is supposed to be STABLE.
  87. Baiting Mac Users? by buckhead_buddy · · Score: 1
    DeHat wrote:
    It sounded like the opening volley of the second great Unix war, only this time instead of pitting proprietary Unix vendors and systems against each other... it is two open source ones. It will be interesting to see what weapon the BSD crowd will retaliate with.
    Unfortunately, Linus's targets (BSD and Mach) are the foundation of Darwin a.k.a. Mac OS X. I argue that this was the wrong target to throw unqaulified, and technically irrelevant insults toward. Rather than disparaging the concepts, he insults the developers and their work directly, which is easy to extend to an insult of the users and adopters of that work.

    If noticed, the provocation will harden many non-technical Mac faithful against Linux in ways that can not be amended and atoned later. If one side's methods trump the other's, the technical geeks will see the truth and forget the provocations. Most of the Mac faithful will not see these technical resolutions but they'll remember only the venom that Linus spouted here and now.

    Does this have any impact on acceptance of Linux? It's questionable but I'd say it does. People and markets imitate Macs, and Mac users therefore have a much bigger impact on market and user acceptance than their size indicates. Linux has a huge, rapidly evolving developer community, but not a solid user-only community. Digging yourself into any sort of hole with the potential user-only group is bad no matter how shallow the depth seems now.

    I just hope Linus' little tirade blows over without making headlines on the mac evangelist web sites first.

    1. Re:Baiting Mac Users? by PenGun · · Score: 1

      There is sooo much wrong with the Darwinizaton of FreeBSD that someone is screwing up pretty big time. The memory problems that make OSX pretty much useless as a server are just one example. A few direct attacks are prolly needed.

          PenGun
        Do What Now ??? ... Standards and Practices !

  88. Nope. by Ayanami+Rei · · Score: 1

    Its not COW in general that he has a problem with. COW when using rarely modified pages in userspace is fine. Linux relies on this for lightweight kernel threading behavior (cheap forks)... and a bunch of other stuff.

    The issue at question here is transient userspace buffers for use in streaming type operations. The FreeBSD approach is to use COW semantics from userspace so that the fast case (when userspace doesn't attempt to write its own buffer) is fast, but if it does, it magically gets mapped to a new PTE with its own copy.

    Linus thinks (and he's probably right) that if the user space code intended to touch a buffer that may be in flight, it might as well just do the copy anyway and use write() instead of playing with the VM so that the kernel can do things behind the userspace's back. He thinks that the extra TLB flush delay on top of the actual copy amortized over the expected rate of occurences is greater than the just always-do-it time.
    And that's probably a good guess considering that if the code is "streamlined" to modify a buffer right after "submission" that it was going to have to wait for a copy to be made at some point... or to wait for DMA to finish.
    Plus TLB flush gets VERY EXPENSIVE when you go SMP. And with all the multi-core/hyperthreading/NUMA stuff we're seeing up the wazoo out there, that's a very valid point.

    So in summary... Linus wants: write() always copies... splice() lets the kernel see your page (don't touch it if you don't want to change it at the "wrong time").
    Which is different than COW page buffer/mmaps/anonymous memory semantics. Different purpose completely.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  89. Incompetent Idiot by Arandir · · Score: 1

    If Linux isn't an incompetent idiot, how come he didn't have a vmsplice() function in the kernel until now?

    --
    A Government Is a Body of People, Usually Notably Ungoverned
  90. MOD UP by Ayanami+Rei · · Score: 1

    Thank you. Finally someone gets the big picture and the whole argument here.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:MOD UP by Anonymous Coward · · Score: 0

      The thing is that Linus appears to expand the discussion to COW in general, and doesn't accept the possibility that under very specific circumstances it can work well. He tosses sufficent "generally" and "often"s in there, but then comes across as saying that having the capability to do it with much work is bad. In general the FreeBSD zero_copy stuff is useless and often using it will result in broken code. This is the reason that zero copy is not part of the default kernel.

    2. Re:MOD UP by Ayanami+Rei · · Score: 1

      I'm very sure that wasn't Linus's intention to imply. Something as audacious as suggesting COW in general is flawed when it is a critical component of the current VM would have garnered very different reactions on the LKML where people are savvy about such things.

      His comments are being taken out of context. You have to read the whole emails to get a feeling for what he's trying to say. It annoys me that people also feel that Linus is critizing FreeBSD coders in general; that barb was directed at those who thought that use of the semantics was a good thing (FreeBSD contributor and/or persons suggesting Linux should use it too) Not so much ad hominem as much an unsubtantiated (but not unfounded) claim.

      --
      THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  91. RTFA, please. Or at least my summary here. by ColonelPanic · · Score: 5, Informative

    The complaint is not about general copy-on-write, it's about BSD's ZERO_COPY_SOCKET feature vs. vmsplice().

    Basic explanation: Suppose that a program is doing a lot of output to a file or socket. The program can generate data faster
    than the kernel can consume it, say. So what should the kernel do with the buffer it receives from the user on each write()?
    There are three options.

    1) Copy its content immediately elsewhere, so that on return to User Mode, the buffer remains writable and writes are safe.

    2) Change the access rights of the page containing the buffer, so that no copy need be made unless User Mode attempts
          to modify its content before the kernel has completed the write(). If the user attempts to write, it either gets
          permission to do so (because the kernel is done) or it gets a writable copy.

    3) Let User Mode promise to not modify the buffer's content until told that it's safe to do so, leaving it writable in
          the meantime.

    The default behavior is (1); BSD's zero copy socket feature is (2), and the point of Torvalds' complaint; vmsplice() is (3).

    --
    "Skill shows through where genius wears thin." -Wittgenstein || Religion: uniting aviation and architecture.
  92. A problem of read()/write() semantics, not VM? by KonoWatakushi · · Score: 2, Insightful

    I can certainly see the value in explicit notification of page usage, but I have to wonder if this isn't attacking the problem at the wrong level. It seems that these problems are caused by the semantics of read() and write() calls, requiring data to be read/written to an arbitrarily aligned userspace buffers.

    Zero copy can definitely make things complex, and in the current implementations, the value is arguable. (and being argued...) Still, memory copies have an associated cost. While they may be better than COW with explicit notification, it is still a performance hack, and represents a non-optimal way of dealing with data transfers. (It could be the easiest and best hack to be made, I can't say. In any case, Linus is acting like a git with his name calling here.)

    Perhaps more consideration should be given to the API instead. Using zero copy is obviously a good goal, and it is primarily hindered by the ancient API and protocols. Something where the buffer management is explicit, and the devices themselves actually own the them. (After all, they are the only entities which know what the buffer requirements are.) Arranging it so that the user applications have access to the actual network buffers would be far preferable to playing any of these "games".

    Unfortunately, Ethernet and the IP protocols are not particularly conducive to such an optimal implementation. With enough intelligence in the network adapters though, many of the issues should be manageable, and allow for a good zero copy implementation with a suitable API. It may be more trouble for the application, but if you need the performance, it is a small price to pay.

  93. Wrong. by Ayanami+Rei · · Score: 1

    This article had NOTHING TO DO with forking and shared memory between processes. That is very COW and will never change.
    And really there is very little COW that happens on forking. Generally process share inhereited mmaps to library files and programs across fork that are read only anyway. Buffer cache is buffer cache and forking just copies pointers. Pages which are mmap shared get the COW semantics and thats a known quantity.

    This is entirely about streaming and zero copy networking (where user-space gives buffers to the kernel or vice versa). This is a tricky area and the semantics are difficult.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  94. Not about fork by butlerm · · Score: 2, Informative

    The dispute is not about fork(). It is about techniques to avoid copying the contents of I/O buffers from user space to kernel space - aka "zero copy" writes.

    Linus (minus the ad hominem characterizations) is arguing that the FreeBSD method of VM based copy on write is a poor performer under real world loads, due to the cost of handling the page faults.

    He says that an effective zero copy I/O system requires more explicit coordination between the application and the kernel.

  95. Windaws 95+1/2 was fast inside of 66 MHz Pentuim. by Anonymous Coward · · Score: 0

    Option A) 80% i686 ASM + 20% C.
    Option B) 20% i686 ASM + 80% C.

    Test 1) Applications where the performance of COW is very superior.
    Test 2) Applications where the performance of vmsplice is very superior.
    Test 3) Applications where the performance of merge_ideas_of(COW,vmsplice) is very superior.
    Test 4) Applications where the performance of NO-COW,NO-vmsplice is very superior.

    Please, don't call them 1D10T5 !!!

  96. Mod down didn't read and/nor comprehend TFA by Ayanami+Rei · · Score: 1

    This is about streaming.
    fork() never entered the discussion.
    Everyone uses COW on fork(). That's never been an issue.

    The "issues" you bring up about linux systems with slow startup times or whatever you wrote 4 paragraphs about are non-existant. And copying memory pages certainly wouldn't result in a freeze in any case. Apparently you've never used a decent Unix system so how the hell would you know anyway. Please go take a class in OS design while you're add it, then you can contribute your ill-informed non-sequitirs to the discussion.

    This is about big-boy networking stuff. You go play with your toy OS.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  97. for people like myself who has no idea by mapkinase · · Score: 2, Informative

    ...what COW means.

    May be people like myself should just stay away from this thread...

    --
    I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
  98. It was written in the stars by Anonymous+Bullard · · Score: 0
    Linus Benedict Torvalds. That's where these tongue-in-cheek claims for his benevolent dictatorship over Linux (and the coming World Domination) originally stem from.

    --

    Should invading one's peaceful neighbours be opposed, or rewarded with trade deals?

  99. What harsh words? by Inoshiro · · Score: 5, Insightful

    Andy went out and said that he thought the Linux approach was wrong, and archaic, and that people should go and wait for GNU.

    Linus said that he felt this was wrong, and that being a prof is no excuse for Minix being the mess it was (and Minix was a mess in the late 1980s/early 1990s). He also apologized if he came off as too harsh for his writing about how people should be able to throw away an old design in favour of a new one anyway, etc.

    It was very polite compared to some of the non-Andy/Linux replies.

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
    1. Re:What harsh words? by Bing+Tsher+E · · Score: 1

      Minix was (and still mostly is) an academic exercise. It's an OS wrapped around a textbook for teaching Operating System concepts. It was never intended as 'production' code. At the time, it was one of the few 'real' operating systems for which the full source was readily available for people to learn from. Linus owes Tannenbaum and Minix a lot more than most casual observers of the 'conflict' acknowledge.

    2. Re:What harsh words? by StarKruzr · · Score: 1

      I don't get it. Linux does things the Minix Way (i.e. monolithic kernels). Tanenbaum STILL claims this was the wrong thing to do?

      --

      +++ATH0
    3. Re:What harsh words? by Anonymous Coward · · Score: 1, Informative

      > I don't get it. Linux does things the Minix Way (i.e. monolithic kernels).
      > Tanenbaum STILL claims this was the wrong thing to do?

      No-- Minix is microkernel-based. That's why Tanenbaum said Linux was a step backward.

      Bear in mind, he was looking from the angle of its design as a NEW operating system kernel, and not a re-implementation (clone) of a time-tested one.

    4. Re:What harsh words? by mycall · · Score: 1

      This isn't true. The $100 laptop is using the new Minix 3

  100. Not full of bunk. by Ayanami+Rei · · Score: 1

    The fake memory copy is still stupid... but for different reasons. It looked good on paper, but it even didn't work very well on a single CPU system. The mach case was not for the kind of implicit COW you get for forking and file sharing and stuff, but for "sending" a userspace buffer to another thread.

    But yes, the COW overhead still comes into play which invalidates it all. But he also mentions that really the "copy" if it was explicit instead is in fact very light since its handled by the caches and doesn't involve a TLB flush as the sizes are small. If the sizes are large, then you're doing something fundanmentally wrong and perhaps you should be using SHM and semaphores explicitly.

    Anyway his comment about Microsoft is pretty accurate. They were the first people to start pushing the idea of exposing the file buffers directly to network DMA to get speed. Windows NT is actually a good platform for this what with IO completetion ports and other nice backend stuff. But IIS would be the last platform I would want to deploy something like that on (especially in version 4/5).

    But having some many layers of caches and coherency protocols makes trying to circumvent all of it dicey unless you OOB communicate your intentions. You also have to balance accounting overhead vs. slow method with no overhead -- common fast case vs. fast case "predicition misses" in all types I/O handling.

    Linus is making a claim of bad balancing in fast case vs. prediction miss and overemphasizing overhead costs in the case of the COW sendfile/zero_copy semantics.
    Mach was making a similar mistake... hence the grandparents' comment. This parallel was insufficiently shown.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  101. Either Photoshop or POSIX is wrong. by Inoshiro · · Score: 1

    "When Linus is forced to sit there and watch for three minutes while Photoshop forks to run some simple helper ("

    He will ask the question why the Photoshop programmers chose to use a fork and then exec, instead of a call that creates a new process without duplicating the parent memory space. If the Photoshop programmers did not have this choice, we have to ask what is wrong with POSIX that there is not a call to create a new process with some initial values in its argv/envp from a parent process without duplicating its memory space.

    Conceptually, COW fork tricks worked well when CPU and memory didn't have as much of a Neumann bottleneck as they now have. Machines in the 1980s had maybe a 2x or 3x divider on memory to CPU. I recently benchmarked several machines for a thesis where the memory bus was 6x slower than the CPU, assuming peak conditions. The moment you get above cache size, performance falls through the floor; you don't want to dick around handling page faults in that situation, you want it to run as smoothly as possible. Linus is right about this.

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  102. RTFA, dammit! by XNormal · · Score: 1

    He is not talking about COW in general. Linux *does* use COW for fork.

    He is referring to a specific "optimization" of zero-copy I/O where instead of copying the user buffers to temporary kernel buffers the I/O is performed directly from the user buffers and the pages are COWed to make sure the program doesn't change them after the write call but before the data has actually been transmitted to the network, written to disk, etc.

    Using COW in this case may sound clever but the fact is that linear memory copy is extremely fast and updating the page table and flushing the TLB cache across multiple CPUs is very slow.

    It seems that the FreeBSD developers were "too clever" in this case. I am still somewhat surprised at the words he used to express his opinion of them. Linus is usually pretty diplomatic.

    --
    Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
    1. Re:RTFA, dammit! by Anonymous Coward · · Score: 0

      You idiot. Do you think he garnered all the information he posted based on the Slashdot summary? He didn't fully understand the issue, even after reading the article. If you had Read The Fucking Thread (dammit), you would know that. There is even a nice post by him that demonstrates the problem in code after he figured it out.

      But you go right on flaming there. I'm sure it's great for your heartburn.

  103. Contribute first, bitch second by chx1975 · · Score: 1

    Following this rule, Linus has enough bitching rights on any OSS software that even his grandchildren will have some.

  104. Can't we all just get along? by Anonymous Coward · · Score: 0

    MSFT is the beast lets not have a Linux BSD fight!

  105. Shocked... by DoctorDyna · · Score: 1
    Im suprised he didnt intraject some wonderful tid bit about ubuntu using gnome. We all know how much Linus loves KDE. kubuntu you say? psh, that's second page nonsence!

    Oh, and then there is that article a while back about "linux users being snobs" or some such.. I think maybe the longer you spend around *nix the more introverted and judgemental you get.

    just a theory.

    --
    Windows has more viruses because linux has more virus coders.
  106. Interesting forum by iplayfast · · Score: 1

    This story was interesting, but not especially newsworthy. However the folks on slashdot explaining/arguing and counter-arguing made this one of the most interesting forums on slashdot that I've seen in a LONG while.

    I don't know about the rest of you, but I like the news that really for Nerds. I don't have a big interest in google,patents, American politics etc.

    But a story about a Linus having a COW over a COW (pun intended) makes my day, especially when there is a worthwhile argument about it.

  107. ? Huh? by Ayanami+Rei · · Score: 1

    PTE overhead is 1K per 1M of (actively) mapped memory. This is the sum of used (not allocated) user pages, memory maps/buffer cache, and kernel buffers.
    Generally speaking a linux system tends to want to push the overall usage towards the physical memory limit. If a system has 2G of physical RAM, the linux system generally will end up with about 2M worth of PTEs and directories after running for awhile.

    The reason why the system slows down when you trounce all over 80% of memory is because a very large portion of that memory space is dedicated to file buffers and the data segment of dormant daemons and such. By trouncing all over active memory you evict many process spaces to swap and free read-only shared memory... forcing daemons to reload libraries and such when they wake up and start page faulting. Also the kernel uses heuristics to determine when a program is misbehaving and writing to abnormally large "working sets" and it can evict you before you allocate and write to the theoretical limit you should be able to. The kernel reserves overhead of possibly usable memory for DMA destinations and other processes, swapin space, whatever.

    Sounds like you needed more swap anyway, if that was the case. I've been able to exceed 100% easily... its slow, of course. :-)

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  108. Garbage Collection. by hackwrench · · Score: 1

    Ideally, if time and resources allow it, I would think that one would make the copy when one has some free time between the time the memory gets shared and the time the memory is attempted to be written to. It isn't garbage collection in the strictest sense, but the same concept is involved.

  109. HW developments make "zero copy" obsolete by Anonymous Coward · · Score: 0

    Very nice summary !

    I think the main problem underlying this sitution is that page size on x86 has been 4K since .. like .. forever, i.e. since Linus bought his first 80386.

    However, in the mean time, both RAM size and RAM bandwith have increased by 2..3 orders of magnitude, i.e. from 4M to 4G(Ram) or from 16MB/s (250ns cycle time, 32 Bit) to 8GB/s (Bandwidth).

    Unfortunately, latency has not improved nearly as much, maybe from 300ns to 100ns for a page miss.

    Copying 4K of Ram on recent x86 hardware, if properly done, will take on the order of 1us, even if both source and destination are not in the cache, or if you include the cost of writing back the write-allocated destination from the cache.

    If the data to be written by the user app has just been handled, which would likely be the case, and the socket buffer is reused from a pool, you may even be copying from cache to cache, in perhaps 300ns for a 4K transfer.

    On the other hand, incurring a couple of page misses due to flushed TLB's and cache-lines takes around 100ns each, even on a single CPU systems, which is are rapidly being phased out and replaced by multiprocessor systems even in home PCs.

    Even a substantial amount of data piled up between the user process and the kernel (and this doubled without "zero copy"), say 1Meg, will not noticeably affect Machines with >1GIG of Ram, so I think the performance arguments seem very convincing, and even more so if future developments in hardware are taken into consideration.

  110. An explanation by sjames · · Score: 4, Informative

    There seem to be a LOT of misconceptions about the discussion of vmslice() vs COW vs copy. This has nothing to do with conserving memory and everything to do with high performance I/O. If your app just needs to send a couple small files from A to B, you probably don't care about this at all.

    A little background is needed on the terminology and mechanisms of I/O for any of this to make sense. For an example, let's say your app is a very busy web server sending dynamic (but trivial to compute) pages out.

    The oldest and simplest method is copy. The app calls write(int sock, char *buffer, int length) on a socket. The kernel coppies the contents of buffer from userspace memory into a kernel space buffer and at least queues the data to the TCP stack before returning.

    COW is an attempt to avoid the cost of copying the outgoing data.. In that case, the reference count on the physical pages that make up buffer is bumped up (since now kernel and application are both interested in them), and marks the pages as COW. That is, the virtual memory addresses are set as read only and a flag bit is set (more or less). The latter is done so the kernel needn't worry about them again. By the time the write call returns, the app is able to immediatly write to that memory (sorta) without worry.

    When that write happens, the app takes a page fault (writing to a read-only page). The kernel sees that the pages are COW, copies the data to a new physical page, and maps the page in read/write. Then it returns from the fault. OTOH, if the kernel finished with the page first (the data goes on to the wire), it re-marks the page(s) so the app can access them without a copy.

    The hope is that often enough, the app WON'T try to write to the pages while they're busy and so the cost of that copy is saved. If that hope comes through often enough it MIGHT be vaguely uesful. I say MIGHT since there is a significant cost just for marking the pages (the CPU's TLB must be flushed for the change to take effect). If the faults happen, it's a BIG loss since handling a fault takes thousands of CPU cycles.

    So, for it to have any chance to help, the application programmer must already know enough to TRY to avoid writing to the same buffer again until it gets to the wire. Unfortunatly, it can never be sure so most apps don't bother.

    The vmsplice() proposal is fairly simple. In this case, the app explicitly requests special treatment of the write. The pages are NOT marked as read only at all. Instead, the app is on it's honor to leave them alone until the kernel notifies it that they are again available. This saves the copy and the costs of TLB flush AND the (potential) cost of page faults. If the app breaks it's promise, it is the only one to suffer as the data it sent is corrupted (no kernel housekeeping is ever stored in such pages so there are no security implications). Any damage the app might do by sending screwy data could also be done using the old copy method.

    What it all comes down to is that playing tricks with page mapping LOOKS nice at first glance since it SEEMS reasonable that not copying bytes around will save CPU cycles and memory bandwidth. The re-mapping (or just permission changes) on pages SEEMS lightweight. Unfortunatly, in fact, re-mapping or changing permission forces cache invalidations and page faults are just plain expensive. With the direction CPU design is going, these things will likely get more expensive rather than less (as they have for most of the history of microprocessor design).

    It's really not that complex for an application to use. At least in comparison to the complexities and level of knowledge required to write an app that performs well enough to need this in the first place.

    1. Re:An explanation by Pseudonym · · Score: 1

      I read your explanation, and I fail to see how it isn't a serious potential security hole.

      Here's the scenario:

      1. Client crafts some data, initiates IPC using "I promise not to write here" semantics.
      2. Server accepts connection, does sanity checking, finds the data is okay.
      3. Client modifies the data, violating its promise.
      4. Server now has data which it thinks has been sanity checked, but is in fact invalid. Server crashes, gets compromised etc.

      I'm probably misunderstanding here, but this is precisely the kind of problem that copying and/or COW is designed to avoid. There is a third approach, of course: if the client tries to write, block it until the memory area is released. (This is how "vfork" works, for example.)

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    2. Re:An explanation by sjames · · Score: 1

      The key here is that it is used for I/O to devices. In the example, an ethernet card for TCP/IP networking, but an HD would be fine as well. The client app can change the data that has been sent right up until the network card DMAs it to the wire. By the time the server sees it the client can no longer change it.

      For local IPC such as through a pipe, there are a few optioons as well. First, the IPC can happen with one copy instead of two. That is, rather than client memory -> kernel buffer -> server memory, you can just do client memory -> server memory where the copy happens when the server reads the pipe. In this case, "zero copy" applies just to the write operation. The read side still does a copy. Some research kernels have done true zero-copy for IPC, but use scheduler tricks (the sender yields on write and doesn't get scheduled again until the server replies, thus it can't modify the buffer). In cases where the processes truly can trust each other (for example, a single application that forks a number of handler processes), a block of shared memory can be used for IPC.

      While zero-copy writes can be implemented a number of ways, zero-copy reads are more limited. It's easy enough to do a zero-copy read from disk for example, but zero-copy reads from a network device just don't work unless the hardware is designed for it. For example, in Myrinet, the app is expected to pre-allocate recieve buffers and register them with the hardware. The hardware looks at the incoming packet and DMAs it directly into the target's registered buffer, then notifies the reciever that the buffer is ready. If no pre-registered buffer is available, the data is thrown away and the sender recieves an error notification.

      A few fast ethernet cards are set up to perform a partial DMA of the header and then have the driver assign a buffer for the payload. However, they snatched defeat from the jaws of victory by having a fixed size header that doesn't include IP or TCP headers, so it only works for ethernet datagrams on non-standard protocols. So-called TOE where the TCP stack is offloaded onto the card also snatch defeat from the jaws of victory by insisting on doing too much. Some of them could probably be convinced to do the right thing with a firmware change but the manufacturers are much too concerned about their precioussssss IP to ever allow the cards to be made useful for more than a marketing bulletpoint.

    3. Re:An explanation by setagllib · · Score: 1

      You *definitely* can't have zero-copy into a 'read', simply because the page is already defined for the data to be written to - you can't simply 'swap' it with the page with the right data (any such hack should be made illegal). Even so, a memcpy isn't going to kill anyone.

      A server should never assume the *client* did sanity checking on the data it's sending anyway. The kernel can't do this either, since it doesn't know what 'sane' is - bits are bits. Grandparent has forgotten to take his medication.

      If you want a good measure of how efficient your kernel is with pure socket IO, netperf localhost via UDP. Network card/driver overhead is eliminated, so is application logic. The closer it is to pure memory bandwidth, the more awesome your kernel is. SMP really helps here, so it's a bit unfair for UP systems and giant-locked network stacks, though in practice the effect on real network throughput (we're talking multi-client combined throughput) is even better.

      --
      Sam ty sig.
    4. Re:An explanation by sjames · · Score: 1

      You *definitely* can't have zero-copy into a 'read', simply because the page is already defined for the data to be written to - you can't simply 'swap' it with the page with the right data (any such hack should be made illegal). Even so, a memcpy isn't going to kill anyone.

      Myrinet DOES do zero-copy reads. You also CAN do it with disk I/O provided the reads are 'well formed'. A number of systems support opening a file with O_WELLFORMED. When that option is used, buffer page alignment, file offset block alignment, and even multiples of blocksize are enforced by the kernel. Linux has the related O_DIRECT option as well as the raw block interface.

      In any of those cases, no page flipping occurs, the driver simply computes the needed disk block as usual, but sets the DMA to go directly into the physical page mapped at the virtual address of the buffer. In many OSes, the read is synchronous (that is, it blocks until DMA completes. Async is possible as well as long as the user app waits for IO completion notification.

      In the case of Myrinet (as mentioned in another reply), the app pre-registers recieve buffers. The card matches incoming packets to apps and writes it into one of the bregistered read buffers. On completion, the app is notified. If no buffer is available, the data is not buffered, the sender gets an error notification (possibly asynchronously). Wheather or not that is a net win is debatable, but it's existance shows clearly that it can be and has been done.

      I agree that swapping in a page that already has the data is a big loss. It's not only complex (and so likely bug ridden), but requires cache and TLB flushing, so is more costly than a copy. In the end, the copy isn't so bad since most of that cost is cache loading which will happen if/when the app actually touches the data anyway.

      zero-copy read is a win if the objective is to move the data from one DMA capable device such as a hard drive to another such as a network card without changing it. In Linux, sendfile can doi that. This wins for FTP and HTTP servers. zero-copy write is nearly always a win iff you (the kernel) don't do stupid VM tricks.

    5. Re:An explanation by setagllib · · Score: 1

      You're right, I should have considered the possibility of card->user page, I only considered kernel->user page (which may as well be a copy, heck they're very fast with MMX anyway). My issue with this is that the card would have to wait until it's asked to write to the user page, otherwise it will have to write to a kernel page anyway, because you can't quite synchronize the same as with writes.

      I don't favor zero-copy in either COW or splice implementations. The extra sophistication required (either in kernel or userland) and ALWAYS requiring over-allocating is pretty insignificant when you can just do a memcpy, which becomes an almost 'free' operation when you bring in the processor's cache. The tiny potential kernel memory savings of zero-copy are completely nerfed by having to allocate multiple buffers in the user program anyway, so that can't be used as an argument either. Not to mention working with more memory in a single schedule of the processor time offers less chance to utilize the cache properly.

      A better optimization would be to reduce the syscall overhead by buffering the operations themselves in userland until a certain amount of time has passed (something reasonable in the milliseconds), and balancing this out to reduce the overall latency of the IO. With reads this is even better since there's no *extra* latency, but with reads the program should be smart enough to buffer ahead anyway.

      --
      Sam ty sig.
    6. Re:An explanation by sjames · · Score: 1

      I don't favor zero-copy in either COW or splice implementations. The extra sophistication required (either in kernel or userland) and ALWAYS requiring over-allocating is pretty insignificant when you can just do a memcpy, which becomes an almost 'free' operation when you bring in the processor's cache.

      The case where zero-copy can be a win is when the app is serving disk files over the network (HTTP w/ static pages, FTP, NFS, or Samba). In that case, the app has no actual interest in the data it's moving, so bringing it into the cache is a net loss (as something else gets flushed to do it). In that case, there is a benefit to using well-formed I/O (zero-copy reads) to read the disk into memory and zero-copy writes to the network.

      A potential use would be a specialized firewall or router that might or might not have any interest in the payload of a packet. It's a shame to dirty the cache with a 9K jumbo packet when only the first 64 bytes are of interest (other than simply passing on to a destination). Right now, such things are done in the kernel (which DOES make every effort to do it with zero copy), but there are security, stability, and testability advantages to doing it in userspace (consider the case of an embedded firewall device based on the Linux kernel). Tht CAN be implemented with special purpose modifications to drivers (consider, mmaping a char device to gain access to DMA buffers, perhaps a special case of ethertap) but a more general purpose interface would avoid reinventing the wheel and in general would be better tested and maintained than many independant implementations of essentially the same thing. Even better would be a multi-interface router on a card that lets userspace issue routing decisions, but you probably won't find that off the shelf and probably don't want to invest in creating one without a good proof of concept.

      Of the many advantages to Free software, one of the greatest is probably the way it can support true innovation by small (and underfunded) inventors.

    7. Re:An explanation by runderwo · · Score: 1
      I read your explanation, and I fail to see how it isn't a serious potential security hole.
      I fail to see why you would use the client-mapped vesrion of the data instead of memcpy'ing it into a private buffer on the server. memcpy is FAR less expensive than a CoW fault.
    8. Re:An explanation by Pseudonym · · Score: 1
      A server should never assume the *client* did sanity checking on the data it's sending anyway.

      That's not what I said. What I said was that the server should be able to assume that its input has not been modified after sanity checking. vmsplice() appeared to break that assumption, but in retrospect, it was probably that I didn't understand what the previous poster was trying to get at.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    9. Re:An explanation by setagllib · · Score: 1

      Apologies, I misunderstood. Shouldn't the kernel somehow deny access to that page from another process when it's 'owned' by the server? Actually I didn't think the server would get the same page anyway (unless it requested a new page instead of writing to an existing one). What you describe sounds like shared memory, where both processes agree that the memory involved is not exclusively theirs, and if vmsplice() is meant to do something similar between processes (not just user-kernel) then it'll be quite a problem.

      --
      Sam ty sig.
  111. It's not the only problem of TLB. by Anonymous Coward · · Score: 2, Insightful
    The COW problems:
    1. The #1 problem: many context switches to issue each miss of the i-want-write-to-the-marked-protected-page. Solution: few context switches copying more if overneed.
    2. TLB misses.
    3. L1 & L2 Cache misses.
    4. Double TLB misses, one in userspace and other in kernel space.
    5. L1 & L2 Cache misses, one in userspace and other in kernel space.
    6. Multithreading locks (mutexes, semaphores, blocking calls, non-blocking calls, ...) VERSUS NO-locks in monothreading using select/poll, non-blocking calls, ...
    7. Hardware bubbles: pipeline's misses & bubbles, big page-translation bubbles, ...

    -=- ThE DaRK MaN oF tHe ObScURiTY -=-

  112. wrong by Anonymous Coward · · Score: 0

    They don't have Mt. Dew.

  113. tool by Anonymous Coward · · Score: 0

    This guy is becoming more and more of a tool.
    He acts like he has the answer to every dam thing when in reality there are more ways to skin a chicken than 1.
    Frankly Linus you're an arrogant ass and it's showing in your work as well..

  114. Re:RTFA, please. Or at least my summary here. by joe_bruin · · Score: 3, Insightful

    Thank you. I've read more than 30 high-modded posts in this article, and yours is the best explanation of the issue by far.

  115. Not this COW, bovine by Nicolas+MONNET · · Score: 1

    Linux's had copy-on-write on fork pretty much since v0.1.

    I'm not familiar with the particular issue involved in TFA, but it's definitely not this.

  116. He forgot... by kronocide · · Score: 1

    I also claim that Slashdot people usually are smelly and eat their boogers, and have an IQ slightly lower than my daughters pet hamster (that's "hamster" without a "p", btw, for any slashdot posters out there. Try to follow me, ok?).

    Furthermore, I claim that anybody that hasn't noticed by now that I'm an opinionated bastard, and that "impolite" is my middle name, is lacking a few clues.

    Finally, it's clear that I'm not only the smartest person around, I'm also incredibly good-looking, and that my infallible charm is also second only to my becoming modesty.

    So there. Just to clarify.


    He forgot most mature! Most mature person around! *sigh* Well, if he can't trust the /. crowd to put up with his attitude, what reaction does he expect to get from people who don't give a microchannel's @ss about kernel code?

    1. Re:He forgot... by groug · · Score: 1

      I naively thought that trolls came from Norway... :-\

      --
      Anarchy is about taking complete responsibility for yourself. - Alan Moore
  117. Welcome to FUD-land by TPS+Report · · Score: 1
    mrsbrisby (60242) stated:
    Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD. (PDF reference) Hrm. Looks like FreeBSD panics under load in it's default configuration. So sad.
    Interesting that the PDF you linked specifically states:
    From a stability point of view, Linux and NetBSD worked stable all the time, FreeBSD 5.1-RELEASE panicked under load (that went away with 5.1- CURRENT) and OpenBSD crashed and panicked even in 3.4-CURRENT. OpenBSD also surprised me with "interesting" syslog messages like "/bsd: full".
    FreeBSD 5.1 was released on Mon, 9 Jun 2003, or approaching 3 years ago. Note that he did his comparison in October of 2003, 4 months after 5.1R was published (but he did not use FreeBSD 5.1 for his tests). As an aside, The initial FreeBSD 5.x offerings were pretty well known to be of less quality than previous releases, partly because of some major structural changes. I'm not making excuses, just stating observations. By the way, FreeBSD 6.1 is about to be released. Your referenced PDF is quite outdated.

    Hey, if you want to cherry-pick quotes, I'll take some quotes out of context from the same PDF you referenced above:

    The most important OS offering async I/O is FreeBSD.
    Linux 2.4 scales badly for mmap and many processes.
    OK, a normal quotes, from the same PDF you referenced:

    (FreeBSD) kqueue is older than epoll. I think Linux should simply have implemented the kqueue API instead of inventing epoll, but the Linux people insist on doing all the mistakes of the other people again. For example, the epoll guy initially thought he could get away without level triggering. The performance of epoll and kqueue is very similiar.

    I like FreeBSD, but I have nothing against linux. It's fine. You can't take a single man's opinion (or even his experiences from 3 years ago!) and spread it around as current "fact". You are simply spreading FUD, with no real point.
    --
    I was told that I could listen to the radio at a reasonable volume from nine to eleven...
    1. Re:Welcome to FUD-land by segedunum · · Score: 0

      Hey, if you want to cherry-pick quotes, I'll take some quotes out of context from the same PDF you referenced above:

      Hmmm. Unfortunately, nowhere does it say that any released Linux kernel panics under heavy load. I believe that that was the point, and is slightly different from "Linux 2.4 scales badly" and a statement about epoll and kqueue.

  118. Somebody by inode_buddha · · Score: 1

    Somebody mod this guy + "insightful" re: drive-by shouting match. Hey, it happens every so often.

    --
    C|N>K
  119. Re:They've been saying it because it's a valid poi by Anonymous Coward · · Score: 0

    So where are the new tests?

  120. Re:RTFA, please. Or at least my summary here. by menace3society · · Score: 2, Insightful

    So the big question is, what happens if user mode breaks the promise, either intentionally or through lousy programming? If the program fucks up, well, then, I'd rather have FreeBSD's model (actually, I'd rather have someone come up with a thread-safe wrapper function, and keep I/O the way it's supposed ot be, i.e., atomic).

  121. linus is being silly by tlord · · Score: 1

    I even have to go so far as to question his honesty, here.

    There are two interesting cases: (1) programs that generate so many writes, so quickly, that no matter what they will have to periodically block waiting for I/O subsystems to catch up; (2) programs that generate bursts of writes mixed with bursts of computing or idling, during which time I/O subsystems have a chance to catch up.

    Case (2) is quite extremely well solved by COW simply by using a ring of user-space buffers large enough to hold all the writes in a single burst. The resulting code is portable. On a COW system it makes optimal use of memory and time. You could add lots of linux-specific "vmsplice" (or whatever) code to get the same effect on Linux.

    Case (1) is also handled by a ring of buffers if the COW implementation has a particular feature. The feature needed from the COW system is to keep a queue of recent writes. If a page fault occurs on the COW buffer for write N, then don't just copy buffer N, copy buffers N....N+K for some appropriate K. Then all the overhead that Linus is talking about goes away. With that version of COW, applications never lose, sometimes win, and always have portable code. Meanwhile, if you just grow your buffers indefinately waiting for the kernel to tell you it's ok to re-use some, you risk thrashing.

    Where I think Linus is being less than frank is that he must know these basic queuing theory observations. I don't want to accuse him of incompetence so it must be an issue of frankness, no?

    -t

  122. Double standards! by drfuchs · · Score: 1

    Look, all he's saying is "Don't have a cow, man." Nobody complains when Bart says it.

  123. Re:RTFA, please. Or at least my summary here. by SEAL · · Score: 1

    So the big question is, what happens if user mode breaks the promise, either intentionally or through lousy programming? If the program fucks up [...]

    If the programmer didn't know any better, he wouldn't be calling vmsplice() in the first place. Rather, he'd just be using a plain write(), which would work fine, minus the performance boost.

  124. Re:RTFA, please. Or at least my summary here. by Antony+T+Curtis · · Score: 1

    OS/2 actually had a neat solution for this senerio.

    A process can allocate pages of memory from the operating system, fill it with data before giving those pages to another process (who becomes responsible for freeing the memory back to the operating system). OS/2 had a number of API calls to make this easy... Allocate memory marked as "giveable"... fill it with data... post it in a message queue... operating system posts message to consumer and has given the memory to the consumer... consumer does something with it... comsumer frees the memory back to the operating system for reuse. OS/2-style zero-copy inter-process communication.

    That way, it is clear who owns the memory... In a similar way, such handovers can be done to implement zero copy for all kinds of things...

    --
    No sig. Move along - nothing to see here.
  125. Re:Wrong Side of Bed, wrong fight to fight by Anonymous Coward · · Score: 0

    The more I read the sample code in these threads, the more I think that this is an arguement between Java/Ruby style code with GC, and C code. Of course, I base this on nothing, but it sounds like BSD style kernel would run code written in higher level languages faster, with a finely tuned GC memory manager, but the Linux way enables C programmers to be the control freaks they are. Not because C programmers would be better or worse than Java/Ruby programmers at memory management, but because the user level code would wind up allocating new memory each iteration through a loop, rather than re-using a static block.

    Tough to say what's best without a test or 4, even if I am right...

    -Chris

  126. did anybody catch this paragraph? by mseidl · · Score: 0

    From: Linus Torvalds [email blocked] Subject: Re: Linux 2.6.17-rc2 Date: Fri, 21 Apr 2006 10:58:46 -0700 (PDT) I got slashdotted! Yay! On Thu, 20 Apr 2006, Linus Torvalds wrote: > > I claim that Mach people (and apparently FreeBSD) are incompetent idiots. I also claim that Slashdot people usually are smelly and eat their boogers, and have an IQ slightly lower than my daughters pet hamster (that's "hamster" without a "p", btw, for any slashdot posters out there. Try to follow me, ok?). Furthermore, I claim that anybody that hasn't noticed by now that I'm an opinionated bastard, and that "impolite" is my middle name, is lacking a few clues. Finally, it's clear that I'm not only the smartest person around, I'm also incredibly good-looking, and that my infallible charm is also second only to my becoming modesty. So there. Just to clarify. Linus "bow down before me, you scum" Torvalds

  127. "entry level" by Bill,+Shooter+of+Bul · · Score: 1

    Apparently, all of the major harware venders disagree with your definition of "entry level". I don't know who you are or what you do, but you might not want to program for embeded hardware. You're probely one of those jerks that writes a custom media player for a third party vendor (creative labs?) that requires 512 mb of memory to run. If so, I loathe you, sir.

    --
    Well.. maybe. Or Maybe not. But Definitely not sort of.
  128. question on the math by sacrilicious · · Score: 1
    [yes the exceptions are more expensive] it's basic math: cost of page copy + exception + 2 * (page table update) is greater than cost of page copy + page table update.

    Wouldn't there be a constant (an unknowable one) representing the percentage of the instances in which memory is reused? If a piece of memory used in a copyOnWrite scheme is never written to by another process, none of the page table updates occur, right? If K is the (often unknowable) constant indicating this percentage of the time when the copyOnWrite exception is generated, I'm thinking the equation you describe would need to be modified to something like:

    (K*(cost of page copy + exception + 2 * (page table update))) + ((1-K)(cost of page copy + page table update))
    must be greater than
    (cost of page copy + page table update)
    for Linus' argument to stand up...
    --
    - First they ignore you, then they laugh at you, then ???, then profit.
    1. Re:question on the math by mrsbrisby · · Score: 1

      Wouldn't there be a constant (an unknowable one) representing the percentage of the instances in which memory is reused? If a piece of memory used in a copyOnWrite scheme is never written to by another process, none of the page table updates occur, right? If K is the (often unknowable) constant indicating this percentage of the time when the copyOnWrite exception is generated,

      What are you babbling about? The COW always occurs. The "other process" is the kernel and the page fault occurs when the original process attempts to re-fill it's memory buffer. If the write is part of a very tight loop, then the page fault occurs at the top of every loop. Since the page fault happens at the top of every loop, that means that you have the copy, and the page fault, in addition to the pagetable updates. It simply would've been faster to copy it once.

      Linux `solves this' by using explicit notification. That means that userspace can reuse or free the buffer after the kernel is done using it, and as a result, copy-on-write simply isn't necessary.

      Please read the article, and if you still don't understand, read my other posts in this thread on the subject.

  129. AHA! by ElboRuum · · Score: 1

    You must be one of those smelly, booger-eating, hamster-IQ'd (w/o the 'p', naturally) Slashdotters he mentioned. He seems to like bowing. Are you bowing?

  130. As Bart says... by marcushnk · · Score: 1
    --
    "Consider how lucky you are that life has been good to you so far. Alternatively, if life hasn't been good to you so far
  131. The last time I checked.... by aphor · · Score: 1

    As I remember, the FreeBSD code in question was only compiled when explicitly configured, and the option and code was clearly marked EXPERIMENTAL. The way I see it is that Linus is really uncomfortable with people pushing Linux in that direction because their code is messier and more difficult to refactor (and change directions) in general. In FreeBSD people who implement this code are required to keep the dangerous stuff in a sandbox that can be applied optionally to encourage lots of high quality real-world testing. You can talk about the theoretical cost to caching performance, but the proof is in the pudding. If Linus is right, the best argument is a working example. If he is wrong, the best argument is a working example. Who is he to make disparaging remarks about other peoples' toys anyway?

    --
    --- Nothing clever here: move along now...
  132. Oh come on! by It's+a+thing · · Score: 1

    How can you resist something abbrivated COW?

    MOO!

    --
    Staring at a white background [on a computer screen] while you read is like staring at a light bulb — Maddox
  133. Every time VM gets discussed.... by SETIGuy · · Score: 1
    Every time VM gets discussed I become less impressed with kernel hackers and less impressed with coders in general. This goes for both Linux and the BSDs. Why would anyone read into a buffer smaller than the VM page size? Why should data need to be copied in the first place?
    size_t bufsiz=getpagesize();
    char *buffer=vmalloc(bufsiz);
    int nread = 0;
    int length;

    while(read < totalSize)
    {
    /* if fread() isn't directly remapping pages from a merged
    * freelist/diskcache into the address pointed to by buffer
    * then it's a stupid implementation. If you don't know
    * whether it's a stupid implementation call mmap() directly.
    */
    #ifdef SMART_FREAD
    length = fread(buffer, 1, bufsiz, file);
    #else
    /* emulate fread with the VM subsystem */
    if (buffer) munmap(buffer,bufsiz);
    buffer = mmap(buffer,bufsiz,PROT_READ|PROT_WRITE,MAP_PRIVAT E,fileno(file),nread);
    /* With proper read-ahead by the cache subsystem this should not */
    /* cause a page fault on every mapping unless you write to this page. */
    /* If it does, copy-on-write is the least of your worries. */
    if (buffer) {
    length=bufsiz;
    } else {
    length=0;
    }
    #endif
    if (length) {
    /* do stuff */
    nread += length;
    }
    }
    If there's a signficant difference between the performance depending upon whether SMART_FREAD is defined, then there are problems with the either the VM or I/O implementation. From the sound of it, unless Linus is defining a flag to the mmap call, he's making a copy of every multiply mapped PROT_WRITE, MAP_PRIVATE page whether it gets written to or not, which means large PROT_WRITE, MAP_PRIVATE mappings use more memory but page fault less. In BSD, you get more page faults but use less memory. Each has advantages depending upon the application. Neither requires calling the user of the other an imbecile.

    In a large memory machine with moderate size "write often" memory allocations (i.e. the typical desktop) the Linux scheme would have benefits. In machine that uses datasets much larger than memory that are written rarely, the COW scheme has benefits. An example would be a large page-locked database where the page fault and page lock mechanisms are intertwined.

    The proper thing would be to split MAP_PRIVATE into two options MAP_PRIVATE_COW and MAP_PRIVATE_COM (COM=copy on map) with MAP_PRIVATE being defined to one or the other. Whether Linux will be giving user mode applications the option of one or the other, I don't know. I certainly hope it's not going to be a separate userland API from mmap().

    1. Re:Every time VM gets discussed.... by mrsbrisby · · Score: 1

      From the sound of it, unless Linus is defining a flag to the mmap call, he's making a copy of every multiply mapped PROT_WRITE, MAP_PRIVATE page whether it gets written to or not

      There isn't any such thing as a multiply mapped PROT_WRITE MAP_PRIVATE. Where are these terms coming from? If multiple processes have access to the same page allocated using mmap(), it was allocated using MAP_SHARED.

      which means large PROT_WRITE, MAP_PRIVATE mappings use more memory but page fault less.

      MAP_PRIVATE has nothing to do with faults.

      In BSD, you get more page faults but use less memory.

      No, you get page faults on every read, and use the same amount of memory that doing read() and write() would do. That's why it's stupid.

      Each has advantages depending upon the application. Neither requires calling the user of the other an imbecile.

      No. There is no advantage to BSD zero_copy because it's slower than read() and write() in the real world. It may be possible for userspace to make it faster than read() and write(), but this is extremely complicated, and nobody does it.

      the COW scheme has benefits.

      No it doesn't. Go read the article. We're not talking about reading and writing data sets. We're talking about kernel buffer sharing.

      The proper thing would be to split MAP_PRIVATE into two options MAP_PRIVATE_COW and MAP_PRIVATE_COM (COM=copy on map) with MAP_PRIVATE being defined to one or the other.

      MAP_PRIVATE isn't COW. It's not even related. MAP_PRIVATE is a flag for mmap() which has nothing at all to do with COW.

      Your example code isn't an example of zero_copy because the pages aren't touched. If you touch the pages in buffer, a page fault occurs. If you had a [fictional] zero_write() on buffer, then the kernel wouldn't have to page fault, but then, you've just invented a more complicated sendfile() - something that has nothing to do with the article.

    2. Re:Every time VM gets discussed.... by darksith69 · · Score: 1
      Let me ask you two offtopic questions about the code you posted (I'm one of those stupid coders): if you allocate getpagesize(), aren't you actually allocating two pages? I mean, malloc needs to keep somewhere the size of how much memory you allocated, so you are actually using getpagesize() + bytes used by malloc implementation. Is this correct?

      Second, never saw kmalloc or vmalloc before, but from a quick google it seems that they should be equal for this example, right? kmalloc seems to malloc physically contiguous pages and vmalloc not, but since you are requesting just one getpagesize(), it doesn't matter which one you use?

    3. Re:Every time VM gets discussed.... by SETIGuy · · Score: 1
      It's amazing what people don't understand about virtual memory implementations.

      There isn't any such thing as a multiply mapped PROT_WRITE MAP_PRIVATE.

      There certainly is. That what a copy-on-write page usually is. MAP_PRIVATE means shared until written.

      If multiple processes have access to the same page allocated using mmap(), it was allocated using MAP_SHARED.

      No. You've got that wrong. The difference between MAP_PRIVATE and MAP_SHARED is not whether the pages are shared or not, but whether changes to the page are shared. If you MAP_PRIVATE a page without PROT_WRITE, only one copy of that page will ever exist in memory regardless of how many processes map it. In other words without PROT_WRITE, there is no difference between a MAP_PRIVATE and a MAP_SHARED page. This is often how exectuables are mapped. The difference between MAP_PRIVATE and MAP_SHARED is whether changes to the page BY THE MAPPING PROCESS are shared. It's very important that an application developer understand this, or they are opening themselves up for problems.

      For example, one process maps a page PROT_READ|PROT_WRITE and MAP_PRIVATE while another maps it PROT_READ|PROT_WRITE and MAP_SHARED. If the MAP_SHARED process writes to the page, under many (if not most) operating systems, the MAP_PRIVATE process WILL see the changes to the page. This is, so long as the MAP_PRIVATE process has not written to the page. Once the MAP_PRIVATE process writes to the page, a private copy of the page is made and the mapping between the original and the copy is lost. Following that point writes by the MAP_SHARED process will not be reflected in the MAP_PRIVATE copy. The copy is not necessarily (read usually not) made before the first write.

      MAP_PRIVATE has nothing to do with faults.

      It has everything to do with faults because the first write attempt to a writable MAP_PRIVATE page will almost certainly cause a copy-on-write fault.

      MAP_PRIVATE isn't COW. It's not even related. MAP_PRIVATE is a flag for mmap() which has nothing at all to do with COW.

      Is this your first experience with virtual memory? The combination of MAP_PRIVATE and PROT_WRITE is (generally) copy-on-write. (I say generally because I haven't used every single UNIX system that implements mmap(), just most of them.). The combination of MAP_PRIVATE or MAP_SHARED and not PROT_WRITE is segfault-on-write. The combination of MAP_SHARED and PROT_WRITE is no action on write.

      Your example code isn't an example of zero_copy because the pages aren't touched.

      I assume by touched you mean written? In such code fragments the comment "/* do stuff here */" usually implies that the page is written or read. But you are right that it shouldn't be an example of zero copy because there is no opportunity to read zeros from the page. If any operating system clears a page in my example, (unless the device we are reading is /dev/zero), it's a waste of time. I was discussing the generalities of VM rather than Linus's tirade.

      Yes, sometimes copy on write of a zero page is a waste, but the time to solve that isn't necessarily when the page is mapped. The means of zeroing (and whether to zero) the page should be allowed to depend upon which device is being used. If it's an RS-232 port, who the hell cares whether the buffer page is copy-on-write. You're going to context switch a few hundred times between characters anyway. In some cases the best time to zero unused portions of a page might be AFTER the completion of the first write.

    4. Re:Every time VM gets discussed.... by SETIGuy · · Score: 1
      Let me ask you two offtopic questions about the code you posted (I'm one of those stupid coders): if you allocate getpagesize(), aren't you actually allocating two pages? I mean, malloc needs to keep somewhere the size of how much memory you allocated, so you are actually using getpagesize() + bytes used by malloc implementation. Is this correct?

      Actually I meant to write valloc() there, rather than vmalloc() since I was writing userland code rather than kernel code. (Apparently, I'm pretty stupid, too.)

      Yes, the minimum actual allocation by valloc() is usually two pages, even if you only try to allocate 1 byte. It's usually the equivalient of calling memalign(getpagesize(),size) which is usually implemented over malloc(). It would be easy to concieve of an implementation that doesn't waste address space on mostly empty pages by storing the allocation information elsewhere.

      I rarely use kmalloc() since I don't do much direct kernel hacking, but I'm fairly sure it allows sub-page allocations on a standard heap, as does malloc() and isn't guaranteed to be page aligned. kmalloc() is really only necessary when you want to get a physically adjacent physical pages for DMA, although __get_dma_pages() is a better bet. (I haven't done this in years, so those interfaces in the Linux kernel may have changed. Remember back when your DMA buffers needed to be in the lower 16MB of memory? Remember when they needed to be in the lowest 1MB?)

    5. Re:Every time VM gets discussed.... by mrsbrisby · · Score: 1

      There certainly is. That what a copy-on-write page usually is. MAP_PRIVATE means shared until written.

      No it isn't. MAP_PRIVATE is a flag to mmap(). It says nothing about copy on write, it says not to share this page with any other processes that map the same file/region.

      Most unixes will implement CoW on fork()- and that usually includes pages allocated using MAP_PRIVATE- but it's not MAP_PRIVATE that has anything to do with this- the pages allocated with malloc() were also copy-on-write.

      No. You've got that wrong. The difference between MAP_PRIVATE and MAP_SHARED is not whether the pages are shared or not, but whether changes to the page are shared.

      I said access. The operating system is free to use whatever definition it likes for access - except when we're talking about writes. Since we ARE talking about writes, I apologize for assuming you understood that I could only possibly mean write.

      Is this your first experience with virtual memory?

      Was your first experience with virtual memory with mmap()?

      mmap() can be implemented (entirely to POSIX.1b) without virtual memory and without shared pages. If you want a hint, read the manual page for msync().

      I was discussing the generalities of VM rather than Linus's tirade.

      Correction: You were discussing a common implementation of mmap() and not mmap and not virtual memory. Tirade is subjective. Read the article. Learn how virtual memory works, and how mmap() has nothing to do with virtual memory. Then pay attention to the fact that we're talking about zero-copy buffers and not fork() and not reading zeros.

      You are right that it shouldn't be an example of zero copy because there is no opportunity to read zeros from the page.

      zero-copy has nothing to do with reading zeros. It has to do with the operating system mapping the same buffer that write() used to the network interface itself. If this is the case, then the network hardware reads the buffer directly from userspace. Substitute network hardware for disk or any other peripheral and you've got the kind of zero-copy that Linus is talking about.

      The trick is we want write() to be able to return immediately. The FreeBSD people CoW that page that write() used so that if the caller uses it again too soon, the page faults and gets copied.

      The caller could've used malloc() or the stack to get that buffer- it doesn't matter.

      Linux gives explicit notification- that is, userspace gets told when write() completes (later), and it can reuse that buffer.

      Once this is in-place, there is no need to CoW the pages because userspace isn't going to accidentally use it before the pages are available for reuse.

      If it's an RS-232 port, who the hell cares whether the buffer page is copy-on-write.

      This must be the model the FreeBSD people were using when implementing CoW on I/O buffers. The problem is when those buffers move VERY VERY QUICKLY, and the page fault just isn't worth it.

    6. Re:Every time VM gets discussed.... by SETIGuy · · Score: 1
      No it isn't. MAP_PRIVATE is a flag to mmap(). It says nothing about copy on write, it says not to share this page with any other processes that map the same file/region.

      It's clear you don't know what you are talking about. That statement is just plain wrong.

      % uname -sr
      Linux 2.6.16-1.2096_FC4smp
      % man mmap
      . . .
      MAP_PRIVATE Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap call are visible in the mapped region.
      . . .

      Or you could look here... If MAP_PRIVATE is specified, modifications to the mapped data by the calling process shall be visible only to the calling process and shall not change the underlying object. It is unspecified whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping.

    7. Re:Every time VM gets discussed.... by mrsbrisby · · Score: 1

      It's clear you don't know what you are talking about. That statement is just plain wrong.

      No. What's clear is you don't know what I'm talking about. That's not the same thing at all.

      % man mmap
      . . .
      MAP_PRIVATE Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap call are visible in the mapped region.


      So what? You've found yet another Linux manual page that's wrong.

      Consider for a moment- if you were right- mmap(MAP_PRIVATE) immediately followed by writing to those pages would cause a page-fault.

      mmap(MAP_PRIVATE) does exactly what SUS and POSIX 1003 say it does- it marks the pages as private. fork() actually makes those pages copy on write- and thus, a write to the page after fork() is what actually produces the page-fault.

      It's entirely possible to (correctly and completely!) implement mmap(MAP_PRIVATE) on a machine without virtual memory- that is, a machine without an mmu- provided you don't also have to implement fork().

      In any event, it's moot: Nobody on this thread is talking about using CoW to protect pages between two processes except people who didn't read the article and/or didn't understand it.

  134. People like bold by hayriye · · Score: 1

    To be popular, you must say bold things. And Linus Torvalds do it very well...

  135. Re:RTFA, please. Or at least my summary here. by menace3society · · Score: 2, Insightful

    I'm sure someone said the same thing about the total size of segmented ICMP packets.

  136. Message Passing by qeorqe · · Score: 1
    Mach was designed around message passing. COW was supposed to make that affordable. Write() and fork() were just instances of this. Mach was also a research project that was studying this technique. The code in user space did not need to be aware of COW. As I recall, most of the unix kernel source code did not need to be aware of COW.

    Linux does not have the same design. COW may or may not be appropriate. User code does not normally need to be aware of read ahead and write behind. Does it really need to be aware of COW?

    Recently, I saw the creator of mach. I asked him if he ever uses mach any more. He said no. :)

  137. UVM ? by mikiN · · Score: 1
    (Quoted from "Zero-Copy Data Movement Mechanisms for UVM" (Postscript, severely Googlarbled HTML rendition)
    4.2.2 Disposing of Transfered Data

    Processes receiving data via page transfer need to release these pages when they are no longer needed in order to avoid accumulating too much memory. This can be done in three ways. First, a process may unmap transferred data using the standard munmap system call. A problem with this is that munmap will free both the transfered anons andthe amap containing them. This forces UVM to allocate a new amap the next time pages are transfered to to the same virtual space. A second way to dispose of transfered data is to use the new anflush system call. This system call removes anons from a specified virtual address range without freeing the amap allocated to it. This allows the amap to be reused for future transfers. The final way to dispose of transfered data is to push the anon pages down into the object mapping layer. This can be done either by donating the ownership of the anons' pages to an object (establishinga loanout relationship between the anon and the object), or by freeing the anons and inserting the pages into an object.

    (end quote)

    Isn't this what BSD folks are looking for to counter the threat of the Menacing Penguin? All your COWs are belong to UVM!

    Seriously, I could find very little information on the elusive anflush() system call, and I've got no access to a NetBSD source tree to grep through. Does it even actually exist?

    --
    The Hacker's Guide To The Kernel: Don't panic()!
  138. Mod up by Anonymous Coward · · Score: 0

    The grandparent poster, like most of slashdot, has taken Linus' comment out of context.

  139. Re: Learn Finnish today! by Anonymous Coward · · Score: 0

    Votka on täysin hyväksyttävä kostuke muroille. Haluaisitteko tulla kanssani piehtaroimaan alasti jäätikölle?

  140. This coming from the guy who defended BitKeeper by Tetard · · Score: 1

    Need we say more ?

  141. Minor nit: is your linux system misconfigured? by lpq · · Score: 1

    I'm running 2.6.16, I was curious about your claim that:
            find /usr -exec cat {} >/dev/null \;

    hung "the system". Doesn't hang mine. Are you sure your system is configured correctly? Perhaps you are configuring swap @1-2x a physical memory size of 1-2G? Most people don't realize how bad a performance impact that will have on systems. Disk speeds haven't increased at the same pace as system memory. The swap formulae of the 1980's doesn't scale to large memory systems.

    ?
    -l

    1. Re:Minor nit: is your linux system misconfigured? by arivanov · · Score: 1

      No.

      It is a proper jolly good hard hang.

      100% guaranteed reproducible on any Via C3V1 with a PREEMPT kernel. I have swapped hardware around multiple times to the same effect. 686 clones including C3V2 do not seem to suffer from it so I suspect something gets badly optimised somewhere for non-SSE capable systems. Dunno what. Got enough other stuff on my plate to spend time on debugging this as well.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    2. Re:Minor nit: is your linux system misconfigured? by lpq · · Score: 1

      Hmm....if it is bad code generated for your machine, that would be more a problem in gcc than in the kernel code. How much memory & swap? I run sched-yield, not preempt, 1000Hz clock interrupt, cfq-block i/o schedule. It's a Pentium3-based system, w/1Gmem + a 256M swapfile.

      Good luck...just wanted to let you know it does work for some -- my machine is about 6 years old though -- maybe machines that old have had more testing...

    3. Re:Minor nit: is your linux system misconfigured? by arivanov · · Score: 1

      The same motherboard worked fine with 2.6.8, 2.6.9, 2.6.10 to the extent 2.6.10 VM can be called working, 2.6.11, 2.6.14). No hangs on any one of them. The config is OK as well. Hardware is not at fault either because the hang is 100% reproducible on 2-3 different Via M-I series motherboards with an Eden core. Frankly, I had enough of it so I'd rather port back the patches which I need from 2.6.16 (NFSv4 and ACM), but the underlying tty and locking layers have changed so much that I have to spend more than a week on rewriting the patches and doing the regression testing. IANKD and I do not have that week.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    4. Re:Minor nit: is your linux system misconfigured? by lpq · · Score: 1

      So basically the problem wasn't there in 2.6.10, but is in 2.6.16. If it was a processor specific problem (am not saying it is) in compiler generated code (rare, but it happens), almost anything changed might trigger the bad sequence. Almost any other change would likely steer around it -- like changing optimization options. Might try something less drastic though -- just try a different disk-i/o block scheduler. The default is the anticipatory scheduler which is optimized for throughput, but that may not be ideal for a desktop. The cfq scheduler allows setting priorities by process.

      The default ties disk block priorities to the cpu-nice value when the block is created (i.e. -- it's not visible via the ionice program mentioned in the Documentation). Anyway -- since your problem is I/O related, just another scheduler might change things enough to steer around the problem.

      Good luck...it's always fun living on the latest edge kernel...:^}
      -l

  142. Re:Wrong Side of Bed of the Wrong Room? by Anonymous Coward · · Score: 0

    *BSD's people uses the K.I.S.S. principle ("Keep It Simple Stupid").
    Linux's people uses the P.I.S.I. principle ("Program It Strange Idiots").

  143. They will use Gigantic-grain! 4KiB vs 4B! 1000x!!! by Anonymous Coward · · Score: 0
    There is a big problem of zero copy:
    It doesn't exist zero copy of 4-KiB unaligned buffers
    [e.g. big memcpy(void *dst,void *src,int size) a la COW].
    I say that if i want to transfer copies of a big "dynamic HTML page with many PNGs and JPEGs" from webserver to 1000 remote clients then the zero copy or the CopyOnWrite methods could be worse than if i don't use them.

    Linux will say the modern buffers must to be 4KiB aligned, awesome if there are many "100 bytes buffers"!.

    -=- ThE DaRK MaN oF tHe ObScURiTY -=-

  144. Is contention actually a big issue? by redelm · · Score: 1
    I'm a bit mystified. Are there actually that many CoW pagefaults from userspace write buffers? Even if the write doesn't block, they're usually followed by a read() that surely will.

    I also don't think the pagefault takes nearly as many cycles with APIC.

    1. Re:Is contention actually a big issue? by sjames · · Score: 1

      I'm a bit mystified. Are there actually that many CoW pagefaults from userspace write buffers? Even if the write doesn't block, they're usually followed by a read() that surely will.

      The worst COW schemes depend on the pagefault every time, they just skip the copy if the faulted page's IO was complete. Less bad ones will fixup the page when I/O completes. However, even in that case, any client can trivially force a great many pagefaults on the server by requesting several large streams, then just sitting on them (since the server app doesn't get notified of this, it will just keep going around the ring. The TCP window will fill quickly and the buffers will be waiting on the TCP ACK that will never come). Any legitimate client with a packet loss problem will cause the same thing.

      I also don't think the pagefault takes nearly as many cycles with APIC.

      Even with the page fault problem reduced, it's still big, and there's still the high cost of TLB flushing that happens just to toggle between RO and RW on the page. SMP systems running a threaded app will suffer even more with the TLB flush as it requires all CPUs to get involved.

    2. Re:Is contention actually a big issue? by redelm · · Score: 1
      any client can trivially force a great many pagefaults on the server by requesting several large streams, then just sitting on them (since the server app doesn't get notified of this, it will just keep going around the ring. The TCP window will fill quickly and the buffers will be waiting on the TCP ACK that will never come)
      Ooo .. I can see this'd be nasty! Memory pressure! Wouldn't this be better handled by tuning the TCP stack, specifically smaller buffers/windows until the connection proves-out?

    3. Re:Is contention actually a big issue? by sjames · · Score: 1

      Ooo .. I can see this'd be nasty! Memory pressure! Wouldn't this be better handled by tuning the TCP stack, specifically smaller buffers/windows until the connection proves-out?

      With conventional copy writes, the write call can block to prevent the problem. In the case of COW, the server app could potentially just keep pushing buffers. A smaller window would just mean even less of them got to the wire and avoided contention. I suppose in the COW case, the write could block based on total pages in flight, but that could still allow a fair number of faults first unless the app 'just knows' the magic number and allocates the ring buffer to that size. Under vmsplice, the server will naturally go around the ring once, then wait for notification (and so never fault, even due to evil clients) and doesn't need any magic numbers.

    4. Re:Is contention actually a big issue? by redelm · · Score: 1
      I thought a write would block (or erreturn) if the TCP send buffer got full (waiting for ACKs). It'd better, 'cuz otherwise RAM will fill with waiting buffers.

    5. Re:Is contention actually a big issue? by sjames · · Score: 1

      I thought a write would block (or erreturn) if the TCP send buffer got full (waiting for ACKs). It'd better, 'cuz otherwise RAM will fill with waiting buffers.

      It will block once the send queue fills, it's just that if the ring buffer in the app is smaller than the send queue, you'll start taking pagefaults if COW is used.

    6. Re:Is contention actually a big issue? by redelm · · Score: 1
      Natch! That's why you tune the send queue smaller than the ring buffer! Double buffering is always hazardous and needs considerable thought.

    7. Re:Is contention actually a big issue? by sjames · · Score: 1

      Even so, the TLB flushes are expensive when compared to simply waiting on notification. Page faults simply don't happen even if the ring size is wrong.

  145. Windows NT does this too by Anonymous Coward · · Score: 0

    It's worth a note that what Linus is talking about, is basically what Windows NT does with its "Overlapped I/O" mechanism: the pages containing data are locked in memory and used directly by the kernel, and the application receives a notification when it's done. It's on the honor system for not modifying data that's in use, instead of relying on something like COW. There are a few different notification mechanisms to choose from.

    There's a good article that describes how high-performance network I/O under Windows NT works in general.

  146. Re:They will use Gigantic-grain! 4KiB vs 4B! 1000x by sjames · · Score: 1

    Well aligned buffers are a win, but are not strictly necessary to do zero copy write. The kernel just has to make sure to build a map of all involved physical pages and apply the correct page offsets. For example, you can have 32 128 byte buffers in a single page. If you don't bother trying to align the buffers, they might be in two pages (and the kernel can map them both).

  147. Want to crash FreeBSD? No problem by Viol8 · · Score: 1

    Just mount a floppy disk as a filesystem then eject it and try and read
    and write to the filesystem a few times with maybe a sync or 2. It won't
    take long. And if thats up to date enough for you you can play the same
    game with mounted USB sticks. Yes , 6.0 crashed nicely when I pulled out
    a stick before I unmounted it. When i complained about the floppy disk
    bug around version 4.3 i was told it wasn't a priority. Typical arrogant
    BSD team.

  148. Re:Given the respective quality of the Linux and * by Kirth · · Score: 1

    Yes, pretty incompeteable:

    Box1:
    reboot ~ Mon Apr 24 22:15
    reboot ~ Mon Apr 24 00:34
    reboot ~ Thu Apr 20 17:41
    reboot ~ Tue Apr 18 00:33
    reboot ~ Sun Apr 16 19:52
    reboot ~ Wed Apr 12 16:26
    reboot ~ Mon Apr 10 05:55
    reboot ~ Sun Apr 9 19:40
    reboot ~ Mon Apr 3 13:05

    Box2: (won't turn up anything in the last two months so I give the uptime):

        4:07pm up 106 days, 22 min, 2 users, load average: 1.76, 1.82, 2.09

    Both are home to 1000 webservers, both running at a normal load of 2. Guess which is which?

    Yes, you most probably guessed wrong; the constantly crashing upper one is the FreeBSD; and its not the hardware which has a problem...

    --
    "The more prohibitions there are, The poorer the people will be" -- Lao Tse