Slashdot Mirror


More Effective Use of Shared Memory on Linux

An anonymous reader writes "Making effective use of shared memory in high-level languages such as C++ is not straightforward, but it is possible to overcome the inherent difficulties. This article describes, and includes sample code for, two C++ design patterns that use shared memory on Linux in interesting ways and open the door for more efficient interprocess communication."

46 of 280 comments (clear)

  1. SysV IPC is obsolete by bogolisk · · Score: 4, Informative

    some1 should tell the authors to rtfm.

    $ man shm_open

    --
    Bogus
    1. Re:SysV IPC is obsolete by KiloByte · · Score: 3, Funny

      The university, ~8 years ago, Concurrent Programming lab:

      (talking about ftok)
      Me: But, what is done to prevent clashes if different programs use the same key?
      Prof: Nothing.
      Me: Eh? That's fucking sabotage. (I used "cholerous", but that was in Polish)
      Prof: And that's why we won't use SysV IPC in subsequent lessons.

      The authors here use a static key of 0x1234...

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    2. Re:SysV IPC is obsolete by maxwell+demon · · Score: 5, Funny
      The authors here use a static key of 0x1234...

      Well, that should be a safe choice, because no sane person would use 0x1234, therefore this key is still unused. :-)
      --
      The Tao of math: The numbers you can count are not the real numbers.
    3. Re:SysV IPC is obsolete by Anonymous Coward · · Score: 5, Funny

      0x1234? Amazing! That's the combination on my luggage!

  2. shmem (soon in Boost!) by Cyberax · · Score: 4, Informative

    There is a great C++ library for shared memory support: SHMEM. It can place complex objects and STL-like containers in shared memory. And it is crossplatform (POSIX and Windows are supported).

    And it will soon (hopefully) be a part of Boost!

    1. Re:shmem (soon in Boost!) by maxwell+demon · · Score: 5, Interesting
      It can place complex objects and STL-like containers in shared memory.

      Depends on your definition of "complex objects".

      From the documentation:

      Virtuality forbidden

      This is not an specific problem of Shmem, it is a problem for all shared memory object placing mechanisms. The virtual table pointer and the virtual table are in the address space of the process that constructs the object, so if we place a class with virtual function or inheritance, the virtual function pointer placed in shared memory will be invalid for other processes.


      Basically, I would have been surprised if they had found a solution for that. But I guess it cannot be portably solved. Instead, the system would have to be prepared for it. I could imagine that objects in a shared library (so the same code is guaranteed to be shared to both processes) could be placed in shared memory, if the compiler/runtime system provided the means for it (say, instead of the pointer to a VMT, it would contain an offset into the constant data section of the shared library, and something to identify the library with, say a system-wide unique active library index which is generated by the dynamic linker).
      --
      The Tao of math: The numbers you can count are not the real numbers.
    2. Re:shmem (soon in Boost!) by Cyberax · · Score: 2, Interesting

      Well, it's possible to use shmem as a very fast method for marshalling of arguments across process boundaries and then use BIL (Boost Interfaces Library) to marshall actual function calls. It will look like Local Procedure Call subsystem in Windows NT.

      You can get virtual functions this way and it will be fast enough but not very "nice", of course.

  3. Re:C++ has bigger memory issues by Cyberax · · Score: 2, Informative

    No, but I think reading about ownership policies and why they almost always make GC unneccessary must be compulsory.

  4. Re:C++ has bigger memory issues by ObsessiveMathsFreak · · Score: 5, Insightful

    In fact, forget it; just use an actual OO language instead.

    C++ is an actual Object Oriented language, which is of course half the problem.

    If you mean a pure OO language like Java, in which everything is an object except for primitives and it takes ten classes and wrappers just to read a file, well then C++ isn't exactly an Object Oriented language as such. Perhaps you mean Smalltalk or the like.

    I tell you what though, C++ is still around after all this time. With all the hype surrounding Java, Perl, C#, Python, etc, etc, etc C++ programmers are still there beavering away with the god awful sytax Stroustrup left them with. Even after all the improvments, all the innovation and all the additional research into computer languages, for a hell of a lot of tasks, there is really no real alternative to C++.

    I don't say this as a C++ fanboy, even though I am "somewhat" fond of the language when it is used properly, and not in garbled and unreadable line noise. I say this simply as a statement of fact. There is still no successor to C++.

    I don't want garbage collection so much as I want a cleanup and rationalisation of the syntax. GC would be nice, but forcing more readable code would be even better.

    --
    May the Maths Be with you!
  5. const by hey · · Score: 3, Funny

    I suppose everything marked const could be shared.

  6. This is nothing new by Anonymous Coward · · Score: 4, Interesting
    You've been able to do this for a while using process shared mutexes and condition variables which allow you to do the same things you could do with pthreads and shared memory. The tradeoff is you get better performance avoiding syscalls to do IPC but it's less robust. If you get a segfault, you have to assume that the shared memory is in an unknown state and either shutdown or restart everything. The other processes can (or will be able) to detect this using once robust futex support is in Linux. Idiot programmers will of course ignore this and continue to use the corrupted memory anyway just like they do now with sysV semaphores used as mutexes with the SEM_UNDO option to allow the semaphore to auto reset if a process exits without resetting it.

    Anyway, old stuff. Wake me up when you start talking about the newer tricks with shared memory.

  7. CML by putko · · Score: 3, Informative

    For concurrent applications, it is hard to beat Reppy's CML.
    http://portal.acm.org/citation.cfm?id=113470

    In particular, the things you synchronize on are first-class. Also you can speculatively send/receive things. Normal "select" is only for reading. You don't have to manage your memory either.

    There are other concurrent languages, but CML is nice in that it has a formal semantics, so unlike typical languages like "C", "C++", Erlang or Java, a program has a meaning other than "whatever the program does when I run it."

    You can implement the primitives of CML in your favorite higher-order language, so you don't have to be limited by ML. That's what's in Reppy's book.

    A proper implementation can achieve speeds that are about 30x faster than pthreads for typical tests like "ping/pong".

    --
    http://www.thebricktestament.com/the_law/when_to_s tone_your_children/dt21_18a.html
  8. Hardware-enforced sharing: OLD HAT by VernonNemitz · · Score: 4, Interesting

    Quite a few years ago, there was a brief popularity of something called VRAM (video ram) that had memory cells specifically designed with one input line and TWO output lines. The idea was that the part of the hardware needing to construct an image for the screen ONLY needed to read memory, while the system responsible for creating the image needed both read and write access. Ever since then, I've wondered why they don't use this kind of memory in multi-processor systems, for communication between processors, such that Processor A has read/write access to a block of VRAM, to give info to Processor B (it has read-access only), while Processor B has read/write access to a different block of VRAM, to give info to Processor A (it has read-access only).

    1. Re:Hardware-enforced sharing: OLD HAT by TheRaven64 · · Score: 3, Informative

      Because it's a really ugly hack that only works in a specific domain. If the frame buffer is being read and updated at the same time, you might get some screen corruption. This is irritating, but not really a problem. If a general piece of memory is being read from and written to at the same time, then the reading process gets garbage data. This means that you need to put locks around your memory (which you do anyway) to prevent this, which eliminates the advantage of the second read line.

      --
      I am TheRaven on Soylent News
  9. Re:Microsoft code? by maxwell+demon · · Score: 2, Informative

    Doesn't MS code use CWhatever? I guess the author is coming from a Java background (I for Interface).

    BTW, the very first file is not valid C++: All identifiers which contain double underscores are everywhere and under all circumstances reserved for the implementation. This also includes __COMMON_H__. Change it to e.g. COMMON_H to get valid C++.

    Well, at least his main function returns int.

    --
    The Tao of math: The numbers you can count are not the real numbers.
  10. Re:Microsoft code? by Erik+Hensema · · Score: 2, Insightful

    I consider compiling with -Wall good style, which warns you when doing an assignment used as an expression. if ((a = 1)) supresses the warning, assuming the coder knows what he's doing.

    --

    This is your sig. There are thousands more, but this one is yours.

  11. Doors by Anonymous Coward · · Score: 5, Interesting

    I'm surprised no-one has mentioned Solaris Doors. Doors is an IPC mechanism whereby the first process (client) can hand off any residual time in its timeslice to the second process (server) resulting in short IPC calls running much less time as there is no discarded timeslice time and no wait for the server process to be scheduled (since it uses the client's timeslice).

    1. Re:Doors by Foolhardy · · Score: 2, Interesting

      That sounds like the same as NT's event pairs, used to implement Quick LPC. An event pair consists of a high and a low event. The server thread waits on the high event and the client thread waits on the low event. Only one event can be signalled at one time, and two software interrupts are provided to toggle the event pair's state: interrupt 0x2C calls the function KiSetLowWaitHighThread() and interrupt 0x2B calls the function KiSetHighWaitLowThread(). When one of these is called and another thread (in another process) is waiting on the other event, the kernel schedules the other thread immediately, continuing the same timeslice of the calling thread. Transfer of data is done through a shared memory section mapped to both processes. Quick LPC was introduced in NT 3.51 to make the out-of-process GUI server faster. Apparently Microsoft didn't think it was fast enough, so they moved much of the GUI server into kernel mode in NT4.

      For more information, scroll down to Quick LPC in Local Procedure Call from Undocumented Windows NT.

  12. Comments on comments by Tzinger · · Score: 2, Insightful

    Too many people here are willing to make inane useless comments about honest work efforts. If you have a better way, offer it. If you merely want to say something nasty about someone else's work, save it for the coffee house.

    --
    "If all the American people want is security, let them live in prisons." Eisenhower
  13. Re:C++ has bigger memory issues by theCoder · · Score: 4, Interesting
    C++ already has a garbage collector. Just allocate your objects on the stack instead of the heap:
    void foo()
    {
      SomeObject obj;
     
    // other code
     
    // poof -- obj is deallocated automatically, even if an exception is thrown
    }
    I work on a project that has tens of thousands of C++ classes, and very few "new" and "delete" operations (more "new" than "delete" because we have a class that manages reference counting like a heap garbage collector would do).

    People who think they always need to "new" objects in C++ have spent way too much time using Java.

    Here's another hint -- pass objects to functions as const references:
    void foo(const SomeObject& obj)
    {
    // code
    }
    This way, a copied object isn't allocated for the passing (no memory at all is in fact allocated). The biggest drawback is you can only call "const" methods on the object, but this is outweighed by not using pointers. Not that I don't like pointers, they just increase the complexity and should be used prudently. And as my .sig says, be sure to free those mallocs!

    --
    "Save the whales, feed the hungry, free the mallocs" -- author unknown
  14. Re:10 fold speed improvement - Dekkers mutex ! fas by Anonymous Coward · · Score: 5, Informative

    Yes, some algorithms are worth remembering...

    This one is worth remembering as one to avoid -- it's based on the idea of a busy-wait. Look at the while(test) { /* do nothing */ } loop and outer while loop. This should not be done. Semaphores might be slower in the specific case, but overall system performance will benefit from using best-practices.

    There's a reason this algorithm lies in rest in academic journals: it's only useful as a teaching tool.

  15. Fanboy mods... by mangu · · Score: 2, Informative
    why parent is +1 insightful and not -1 troll? (or flamebait)?


    There are some subjects that draw fanboy clubs here in /.


    Some examples: Java, AMD, Apple, Ruby.


    Try criticizing any of them here, you'll be down-moderated to (-1) pretty quickly. OTOH, praise any of those and you'll get moderated up, no matter how stupid or inconsistent the comment is.

    1. Re:Fanboy mods... by bogolisk · · Score: 2, Informative

      you forgot xxxBSD and Gentoo.

      --
      Bogus
  16. Re:10 fold speed improvement - Dekkers mutex ! fas by Anonymous Coward · · Score: 3, Insightful

    IMHO this algorithm is not a panacea because :

    - It does busy waiting. If one thread holds the 'mutex' for a long time, the other thread will take a lot of CPU for nothing.
    If you really need to take the resource as soon as it is available without giving up the proc, then have a look at "spin locks".

    - It is not very scalable.
    First, you need one version of the algorithm for mono proc one for bi proc, etc. Of course you could put them all in a shared lib and select one at runtime.
    Second, the algo seems to be O(N), N being the number of processors. Therefore the algo slows down when the number of proc increases.

    - And last: it is unclear to me how you pass turn when there are more than 2 procs involved. Does this algorithm work when there is more than 2 processors ?

  17. Re:C++ has bigger memory issues by Curien · · Score: 2, Informative

    Two reasons:
    a) The called function _cannot_ modify the argument. This becomes important to the code surrounding the function call.
        T x(...);
        foo(x); // did x get changed?

    If foo is declared "void foo(T const&)", then you *know* that x has not changed. If instead it's declared as taking a plain reference, you can't know.

    b) You can pass const objects or objects with limited lifetimes.

        foo(T()); // legal if foo takes a T const&, but not a T&

    --
    It's always a long day... 86400 doesn't fit into a short.
  18. Re:10 fold speed improvement - Dekkers mutex ! fas by at_18 · · Score: 3, Informative

    I tried unsuccessfully, verbally, to get a Phd in comp Sci with embedded management experience to believe me it is 100% sound.... argued for 40 minutes. The guy never had a clue.

    The guy had a clue. Your algorithm is a busy-wait loop, so your CPUs will be maxed at 100% while waiting, and the thread will be pushed by the scheduler to lower priority, and so on...

  19. And? by ratboy666 · · Score: 3, Interesting

    Ok, I get it... it's an attempt to exploit shared memory in C++.

    And why is this news? Is it so difficult that nobody has done it? No, that can't be -- the shm stuff can be wrapped. This is so important that it rates a "design pattern"? Not it either -- the one illustrated isn't the best solution.

    So, just what is this article? Methinks fluff. Sort of in line with "How to implement co-routines with setjmp/longjmp" thing. Or, "Restructuring data to assist processor cache residency". And "How to remove locks from performance critical MP code".

    Except not as interesting or useful.

    Ratboy.

    --
    Just another "Cubible(sic) Joe" 2 17 3061
  20. Re:C++ has bigger memory issues by Viol8 · · Score: 2, Insightful

    "Easy programming languages"

    Excuse me? C may be many things , but easy isn't one of them.
    Ask any beginner. And as for the "security problems" in C , as
    you well know , C was designed as a replacement for assembler.
    It wouldn't have been much use if it didn't give you
    complete flexibility wrt memory access, even if that means
    breaking some of the rules that hand-held high level programmers
    use as a crutch because they're unable to code to the metal.

    "are uninterested in quality and very interested in meeting deadlines"

    If you have a mortgage and a family and your boss threatens you
    with the sack if you keep missing deadlines very few if any
    programmers will take the "moral" stand and get fired. And frankly,
    anyone who puts coding principles above their family is scum IMO.

  21. Re:C++ has bigger memory issues by Viol8 · · Score: 2, Insightful

    Actually I tend to find the opposite. With C everything has to be explicit. ie you want a function called, you have to call it at
    that point in the code. WIth C++ you can use opertor overloading,
    polymorphism, hidden convertions , template specialisation and lots of other stuff which makes the actual code more implicit than C and hides whats actually going on away more. Sure , if you spend some time looking at the code you'll figure it out , but C++ doesn't IMO lend itself to skim reading as much as C does.

  22. There are better ways by photon317 · · Score: 4, Informative


    A lot of shared memory synchronization and/or caching problems can be solved on Linux through the effective use of a few simple things:

    1) shm_open (if seperately-started processes which need to coordinate in shared memory), or mmap(MAP_SHARED|MAP_ANONYMOUS) for a process which will fork children which need to communicate/share between themselves and the parent.

    2) Use 's "atomic_t" integer type within that shared memory array (atomic_t* my_shm_array = mmap(....)). The atomic_t type has several functions defined in that header for atomic read, write, increment, etc for the linux hardware platform at hand. On most sane (cache-coherent) SMP architectures, reading and writing are already atomic operations, so this basically devolves to just setting and getting integers like normal (with a little bit of syntactic sugar (struct { volatile int val }) to make sure the C compiler doesn't optimize things away that it shouldn't. And you can implement a whole lot of sane algorithms using nothing but shared memory integer reads and writes with no locking or special atomic increment ops.

    3) If you need more advanced or complex locking on the shared memory for synchronization, use Linux's "futex"'s. They're in the man pages, and they're really fast.

    --
    11*43+456^2
    1. Re:There are better ways by bogolisk · · Score: 2, Interesting

      1) shm_open(2) is already mentioned in the 2nd post.

      2) dont u know that NPTL is already doing this for u? On fast-path, NPTL's posix mutex just do atomic operations and avoid doing syscall. Stick to the standard API and let the platform guys (libc, kernel, ...) do the optimization. They're smarter than u.

      3) u dont want to do this, seriously! if futex is that consummable by the public, then why did the glibc guy write a looooooong paper describing howto use futex.

      --
      Bogus
  23. Java-trolls are clueless, as usual... by bogolisk · · Score: 2, Insightful

    The article is about shared-mem and synchronization accross process boundary! In Java that would mean: object that are shared between VMs; methods are are serialized across VM boundary.

    --
    Bogus
  24. yeah, fast, and 10-fold chance of odd failures by Krischi · · Score: 5, Informative

    Yeah, this algorithm is fast. Too bad that it does not work. This kind of design is a common mistake by people who do not understand the intricacies of multithreaded programming. In short, it fails miserably when the CPUs are allowed to reorder loads and stores, a.k.a. pretty much any modern CPU. You need a memory barrier between setting and testing of a shared variable.

    Google for Dekker's algorithm and memory barrier - you will find better explanations of the problem there than I could type up in my limited time here right now.

  25. Re:High-level language? by psykocrime · · Score: 2, Informative

    I've seen this before, but why is C/C++ marked as a high-level language (as in the summary)? C/C++ are LOW-level languages

    I've never heard that before... everyone I know, and all the literature I've read that described programming languages, considers assembler as "low level" and anything at a higher level of abstraction as "high level." With the exception of a few folks who try to describe C as a "mid level language" or as "high level assembler."

    Calling C++ a "low level language" is absolutely a mistake anyway. It's really a mixed-paradigm language, which includes "low-level" abstractions and some very "high-level" ones. Just because you have the option of dropping down to a low-level close to the hardware doesn't mean you have to.

    --
    // TODO: Insert Cool Sig
  26. Re:C++ has bigger memory issues by smcdow · · Score: 2, Interesting

    Like Java, right?

    Getting back to the original premise of the story, can you even do OS-level shared memory (SysV or POSIX) with Java? OS-level semaphores? Any meaningful kind of IPC? OS-level anything? I mean without godawful JNI nonsense.

    --
    In the course of every project, it will become necessary to shoot the scientists and begin production.
  27. Re:Microsoft code? by Rakshasa+Taisab · · Score: 2, Insightful

    It has something to do with the direction in which you read the code, from the left to the right. In the former example, you'll have to push "EOF !=" on the mental stack before parsing the meaning of "(c = getc(stdin))".

    I like my manga right to left, my code left to right.

    --
    - These characters were randomly selected.
  28. Not much experience on this but... by hdante · · Score: 2, Informative

    The mutex doesn't seem to be shared between processes. This would make the code incorrect. Can anyone confirm this ?

  29. not really usefull by vtoroman · · Score: 2, Informative

    The code shown is using pthread mutex for sync-ing. The mutex works only for synchronization of threads, not processes so the code is useless (even dangerous) for inter process communication (IPC). In the case of threads another question is just screaming for an answer:
          Why would someone use a shared memory block for threads which are all running in the same memory space anyway?

    We come to the conclusion that the code is quite useless for inter-thread communication too. All in all - useless.

    1. Re:not really usefull by vtoroman · · Score: 2, Informative

      The only way to make mutexes interprocess is to enable pthread_mutexattr_setpshared attribute. This hasn't been done in the article's code so the mutex which is used there hase an inter-thread scope, not an inter-process scope.

  30. Re:10 fold speed improvement - The Phd was idiot by LO0G · · Score: 2, Informative

    The PhD is STILL right.

    That code makes a huge fundimental assumption, that write order is preserved. In other words, if you do:

    Write to location 3 on processor 1 (take the lock)
    Read from location 30 on processor 1 (do stuff with the lock held)
    Read from location 3 on processor 2 (check the lock)

    that the reads and writes will appear in order. On ALL modern processors, this assumption is not true, it's possible for the write to location 3 to occur AFTER the read from location 3 on processor 2. It works great on single processor machines, but fails on MP machines.

    In order to make the code work, you need to put a memoy write barrier after the write to location 3, this will force the write to be flushed from the cache.

  31. Cache choherency is NOT sufficient by Krischi · · Score: 2, Interesting

    You don't get it about out-of-order writes, do you? Simple scenario, according to your algorithm:

    CPU AA:

    resource = produce_something();
    turn = BB;
    flags[AA] = FREE;

    CPU BB:

    flags[BB] = BUSY; /* CPU AA clears its BUSY flag at this point in time, so, the while (flags[AA] == BUSY) terminates immediately */
    consume(resource);

    The problem is that AA is free to reorder its writes. So, the actual order could be:

    flags[AA] = FREE; /* from AA */
    flags[BB] = BUSY; /* from BB */
    consume(resource); /* BB uses the resource */
    resource = result of produce_something() call /* writeback from AA is too late */

    Oops. BB accesses the resource before AA writes back the current state. Cache coherency does not solve this problem - the problem is that the write to the resource is still pending. That is what the memory barrier is there for.

    Argue with facts, don't hide behind oh-so-impressive credentials.

    1. Re:Cache choherency is NOT sufficient by Furry+Ice · · Score: 2, Insightful

      This is why closed-source drivers are a bad idea. There are a lot of coders out there who are so impressed with their own credentials that they just don't take the time to read and understand why their clever little hacks don't work. Then, when they try their driver on an architecture where their assumptions don't hold, they give up. The product is probably old and they make a business decision that it's not worth updating the driver for the new architecture.

  32. Preying on the non-comp SCI mods, I see. by Inoshiro · · Score: 4, Insightful

    "How many people know about this? Nobody! I never read about it anywhere. I invented it myself years ago, .."

    Turn to page 55 of your OS design and implementation by Tanenbaum. See where he says, "For a discussion of Dekker's algorithm, see Dijkstra (1965)."? How do you get through a proper comp sci honours degree to the point where you can take a masters and then a PhD without reading Dijkstra?

    How about you crack open that copy of Operating Systems (4th ed) by William Stallings, which has a discussion of concurrency and Dekker's on pages 208-213? How can you get past a 2nd/3rd-year introductory operating systems class without having gone over this topic?

    You are a troll. A troll preying on the fact that most of the moderators here have no idea about computer science, and have not taken a wiff of a real operatings systems class.

    For the record, Peterson's algorithm (published in 1981) is a much simpler solution to your problem. It's on page 56 of the Tanenbaum book, and also discussed in Stallings on page 213. There's a new 5th edition of the Stallings book, but the index will take you to the correct chapter/page in short order.

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  33. Too bad all programmers are sloppy. by Estanislao+Mart�nez · · Score: 2, Insightful
    C/C++ doesn't prevent you from coding secure, leak-free programs. All it does is shift the responsibility for security and memory management from the language to the programmer. If you're a sloppy programmer then, yes, you need a better language than C/C++.

    Experience demonstrates that, by and large, even very good programmers commit a sizeable number of these errors. Not to mention that ensuring proper security and memory management takes time; and time is money.

  34. Re:C++ has bigger memory issues by kbw · · Score: 3, Interesting

    C++ is more than just an OO language. It provides direct support for the procedural paradigm too.

    STL, for example, is not an OO library. Yet it has proved to be immensly useful.

    One place where the garbage collected languages fall down is in the management of resources. The handling of limited resources such as files or sockets must be explicitly released by the programmer. This demonstrates that you simply cannot ignore the lifetime of objects with a garbage collector. And I also assert here that memory is a limited resource too.

    That silly singleton thing in the example is a demonstration of the disregard for the lifetime of that particular object. Does it really need to live for the lifetime of the application? Does it need to be cleanly released?

    I think C++'s memory management model is sufficient. One can hardly say that about garbage collected languages.

  35. This is a mediocre way to get IPC. by Animats · · Score: 3, Informative
    For historical reasons, most of the UNIX-like operating systems have terrible interprocess communication mechanisms. Early UNIX only had pipes. This started a tradition that interprocess communication works like I/O, leading to named pipes, sockets, and domain sockets. The result is a set of rather slow interprocess communication mechanisms. (One can do worse. In the old MacOS, interprocess communication could only pass one message per vertical refresh time, and this wasn't documented.)

    On top of those mechanisms, even slower interprocess communication systems are typically implemented, such as OpenRPC and CORBA. (For even more inefficiency, there's XPC. In Perl. But I digress.)

    Because of this history, there's a perception that interprocess communication has to be slow. It doesn't.

    What you really want looks more like what QNX has - fast interprocess messaging that interacts properly with the scheduler. QNX has to have interprocess communication done right, because it does everything through it, including all I/O. This works out quite well. You take a performance hit (maybe 20% for this), but you get much of that back because the higher levels become more efficient when built on good IPC.

    The QNX messaging primitives are available for Linux, although the implementation isn't good enough for inclusion in the standard kernel. That work should be redone for the current kernel.

    IPC/scheduler interaction really matters. If you get it wrong, each interprocess transaction results in an extra pass through the scheduler, or worse, both the sending process and the receiving process lose their turn at the CPU. This is easy to test. Start up two processes that communicate using your IPC mechanism. Measure the performance. Then start up a compute-bound process and measure again. If the IPC rate drops by much more than a factor of 2, something is wrong. Don't be surprised if it drops by two orders of magnitude. That's an indication that IPC/scheduler interaction was botched.

    Sun addressed this in the mid-1990s with their "Doors" interface in Solaris, which had roughly the right primitives. But that idea never caught on.

    The article here implements a message-passing system via shared memory, which is not exactly a new idea, even for UNIX. I think it first appeared in MERT, in the 1970s. It's an attempt to solve at the user level something that the OS should be doing for you.

    Shared memory is a hack. It's hard to make it work right. With it, one process can crash other processes in hard-to-debug ways. Sometimes you need it because you're moving vast amounts of data, (by which I mean more than just a video stream) but that's rarely the case.