Slashdot Mirror


Protothreads and Other Wicked C Tricks

lwb writes "For those of you interested in interesting hard-core C programming tricks: Adam Dunkels' protothreads library implements an unusually lightweight type of threads. Protothreads are not real threads, but rather something in between an event-driven state machine and regular threads. But they are implemented in 100% portable ANSI C and with an interesting but quite unintuitive use of the switch/case construct. The same trick has previously been used by Simon Tatham to implement coroutines in C. The trick was originally invented by Tom Duff and dubbed Duff's device. You either love it or you hate it!"

22 of 229 comments (clear)

  1. Looks pretty cool by bobalu · · Score: 4, Interesting

    I used a Lifeboat lib back in the late 80's that this reminds me of. Cooperative multitasking. Eventually ported the whole thing to OS/2 and used that threading instead. All the code pretyy much worked as-is.

    --
    The revolution will NOT be televised.
  2. Job security? by elgee · · Score: 4, Funny

    So this is so "counterintuitive" that no one else will ever understand your code?

    Sounds ideal!

  3. From the source: by MythMoth · · Score: 5, Informative
    --
    --- These are not words: wierd, genious, rediculous
    1. Re:From the source: by Bastian · · Score: 4, Funny

      Wow. And I used to think C was frightening when I discovered the fun you can have with a program that takes command-line arguments when you start making recursive calls to main().

      When I saw that code snippet, I found myself switching back and forth between thinking "this is the most beautiful thing I have ever seen" and "dear god, who ordered that monster" so rapidly my brain almost a sploded.

  4. Seen this already by Anonymous Coward · · Score: 5, Funny

    I first came across this while I was working on the e-voting machines. There was a dept especially allocated to investigating how to hide certain features in c code to make them look like soemthing else.

  5. Rob Pike invented this in 1985 by dmoen · · Score: 4, Informative

    This looks very similar to the implementation technique used for the Squeak programming language (not the Smalltalk Squeak). Squeak is a preprocessor for C that makes it very easy to use this technique.

    http://citeseer.ist.psu.edu/cardelli85squeak.html

    Doug Moen

    --
    I have written a truly remarkable program which this sig is too small to contain.
  6. Re:It isn't Duff's device. by LLuthor · · Score: 4, Informative

    Duff's device was the first convoluted form of a switch() statement which became well known.
    All these C "tricks" employ the same technique (though more elegantly) for different goals. Nonetheless, Duff's device can be said to have inspired such code.

    --
    LL
  7. Not new by Anonymous Coward · · Score: 4, Informative

    SGI had state threads library since long http://oss.sgi.com/state-threads

  8. Re:Wait just a minute ... by LLuthor · · Score: 4, Funny

    And the JVM is written in C :)

    --
    LL
  9. I guess the idea is it's extremely portable. by skids · · Score: 5, Informative

    ...not bound to any particular OS.

    If that's what folks are looking for, another option is the tasks added to LibGG a while back. Tradeoffs either way -- LibGG's requires at least C signals (but will use pthreads or windows threads if detected during compile time), whereas this can be used in OS-less firmware. But on the positive side you can use switch() in LibGG tasks -- what
    you can't use are a lot of non-MT-safe system calls. It's an OK abstraction but of course there are so very many ways to accidentally ruin portability that it is far from foolproof.

    http://www.ggi-project.org/documentation/libgg/1.0 .x/ggAddTask.3.html

    1. Re:I guess the idea is it's extremely portable. by twiddlingbits · · Score: 5, Insightful

      It is bound to a paticular KIND of OS. This code would not work right in a pre-emptive multi-tasking OS unless it was the highest priority task. It works best without an OS as it makes it's own blocking.

      I read his paper where he said "writing an event-driven system is hard". I guess he has never heard of a using Finite State Automata for the design? State machines are very simple to program. An event driven system is not at all hard to write, although you often times do have to have some deep hardware and/or procesor knowledge to do it well. I wrote many of them in the 1980's when I did embedded C code for DOD work, although I have not done so in quite a few years. Once Ada came along everyone abandoned C as too obtuse for embedded work for the DOD. I once did benchmarks that showed decent C code without strong optimization outperformed Ada code, but C was dead already in their minds. I'm glad to see some folks are still interested in it on the commercial side of programming. After all we can't write everything in Java ;)

    2. Re:I guess the idea is it's extremely portable. by plalonde2 · · Score: 5, Informative
      The challenge is making the design maintainable. There isn't a program that can't be written as a state machine; but most programs expressed this way are difficult to understand and maintain.

      The argument that Rob Pike makes in A Concurrent Window System and with Luca Cardelli in Squeak: a Language for Communicating with Mice is that many of the event systems and associated state machines that we write can be much simplified by treating input multiplexing, and thus coroutine-like structures, as language primitives.

      This work follows directly from Hoare's Communicating Sequential Processes - a good summary can be found here. Working with CSP only a little has convinced me of how much easier so many systems tasks are in this framework than in the world of the massive state-system/event loop world.

  10. Loop Abuse by wildsurf · · Score: 4, Interesting
    Reminded me of a function I once wrote...

    The PPC architecture has a special-purpose count register with specialized branch instructions relating to it; e.g., the assembly mnemonic 'bdnz' means "decrement the count register by one, and branch if it has not reached zero." I've used this in some pretty weird loops, including this one that broke the Codewarrior 9.3 compiler (fixed in 9.4.) This computes the location of the n'th trailing one in a 32-bit integer. Pardon my weak attempt at formatting this in HTML:

    static uint32 nth_trailing_one(register uint32 p, register uint32 n) {
    register uint32 pd;
    asm {
    mtctr n; bdz end
    top: subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdnz top
    end: }

    return __cntlzw(p ^ (p - 1));
    }

    The idea was that the instruction stream should stay as linear as possible; most of the time the branches are not taken, and execution falls through to the next line of code. Ironically (siliconically?), the entire function could probably be implemented in a single cycle in silicon; shoehorning bitwise functions like this into standard instructions tends to be extremely wasteful. Perhaps FPGA's will make an end run around this at some point. I've also tried this function with a dynamically-calculated jump at the beginning, similar to the case statement logic in the article.

    Hmm, I had a point I was trying to make with this post, but now it's escaped my mind... :-)
    --
    Weeks of coding saves hours of planning.
  11. It was looking interesting until by achurch · · Score: 4, Interesting

    I got to this little gem:

    The advantage of this approach is that blocking is explicit: the programmer knows exactly which functions that block that which functions the never blocks.

    My English parser thread shut down at that point . . .

    Seriously, this looks like a handy little thing for low-memory systems, though I'd be a bit hesitant about pushing at the C standard like that--the last thing you need is a little compiler bug eating your program because the compiler writers never thought you'd do crazy things to switch blocks like that.

  12. Python by meowsqueak · · Score: 4, Interesting

    Weightless threads in Python:

    http://www-128.ibm.com/developerworks/library/l-py thrd.html

    They are cooperative but far more efficient than Python's own threading model. You can easily create hundreds of thousands of concurrent threads.

  13. extremely limited applicability by nothings · · Score: 5, Informative
    Please note that this isn't interesting unless you work in, as, the FA says, a severely memory constrained system. No normal embedded system needs to do this, much less the systems most programmers on Slashdot probably work with.

    This is bad, lame, faux cooperative threads.

    Local variables are not preserved.

    A protothread runs within a single C function and cannot span over other functions. A protothread may call normal C functions, but cannot block inside a called function.

    It's also not even particlarly new [1998].

    Unless memory is at an absolute premium, just use cooperative threading instead. If you try to use prototheads, you'll quickly discover how unlike "real" programming it is. Even just a 4K stack in your cooperative threads will get you way more than protothreads does.

  14. You want cool C stuff... by Dr.+Manhattan · · Score: 4, Interesting

    Get the book Obfiscated C and Other Mysteries by Don Libes. Explanations of various Obfuscated C contest entries, and alternate chapters illustrate neat corners of C, including a few things similar to this little library. Occupies a place of honor on my shelf.

    --
    PHEM - party like it's 1997-2003!
  15. Re: It isn't Duff's device. by shalunov · · Score: 4, Informative

    It most certainly is the Duff's device, or at least is very close to it. Duff's device is, indeed, a way to unroll loops; specifically, a way to unroll loops that uses a peculiarity in switch statement syntax that allows case to point inside a loop body. Now, take a look at lc-switch.h in the Protothreads tarball. It contains macros that use the same peculiarity to jump inside functions instead of loops.

  16. Re:a fun trick only useful in very specialized cas by dmadole · · Score: 4, Informative

    It's too clever to be really useful unfortunately. The big issue is of course the no "local variables". Trouble is, if you are writing in C, the compiler may well be creating local variables for you behind your back. In C++ for example there are many cases where this will certainly happen, like

    void DoSomething(const string&);
    DoSomething("hollow, whirled");

    where a local variable of type string will be temporarily created to pass to routine DoSomething.

    You need to read the article.

    It only says you can't use local variables across functions that block. Actually, it doesn't even say that you can't use them, it only says don't expect their value to be preserved.

    In your example, even if the compiler does create a local variable to call DoSomething, and even if DoSomething does block, who cares if the value of that local variable is preserved, since it's impossible to reference it again after that statement?

    But that was an awfully long time ago. Now it's hard to find memory chips below 1Mbit.

    I can help you with this problem! Is 16 bytes small enough?

    And since you can't use local variables, you can't use things like the C libraries or pretty well any library ever written, which is teh sux0r.

    But you can use the C libraries. Just don't use local variables across functions that block. Only a very few C library functions block.

  17. Use of this technique in Felix by skaller · · Score: 4, Interesting

    FYI this technique is heavily exploited in the programming language Felix:

    http://felix.sf.net/

    to provide user space threading. The main difference is that all the 'C tricks' are generated automatically by the language translator. If you're using gcc then the switch is replaced by a computed jump (a gcc language extension). On my AMD64/2800 time for creating 500,000 threads and sending each a message is 2 seconds, most of the time probably being consumed by calls to malloc, so the real thread creation and context switch rate is probably greater than Meg/sec order .. just a tad faster than Linux. Both MLton and Haskell also support this style of threading with high thread counts and switch rates (although the underlying technology is different).

    --
    John Skaller mailto:skaller@users.sf.net
  18. Re:wtf? by ggvaidya · · Score: 5, Informative
    Okay, I'll try and see if I can figure this thing out (you have to admit, it screws with your mind just looking at it ...):

    You can implement a simple memcpy function like this:
    void copy(char *from, char *to, int count) {
      do {
          *from++ = *to++;
          count--;
      } while(count > 0);
    }
    So far, so good. Now Duff's problem was that this was too slow for his needs. He wanted to do loop unrolling, where each iteration in the loop does more operations, so that the entire loop has to iterate less. This means the 'is count > 0? if so, go back, otherwise go on' part of the loop has to execute fewer times.

    Now, the obvious problem with this is that you don't know how much you can unwind this particular loop. If it has 2 elements, you can't unwind it to three elements, for instance.

    This is where Duff's Device turns up:
    int n = (count + 7) / 8; /* count > 0 assumed */
     
      switch (count % 8)
      {
      case 0: do { *to = *from++;
      case 7: *to++ = *from++;
      case 6: *to++ = *from++;
      case 5: *to++ = *from++;
      case 4: *to++ = *from++;
      case 3: *to++ = *from++;
      case 2: *to++ = *from++;
      case 1: *to++ = *from++;
            } while (--n > 0);
      }
    First, we check to see how much we can unroll the loop - for instance, if count is perfectly divisible by 5, but not 6, 7, or 8, in which case we can safely have 5 copies inside our loop without worry that the copy is going to move past the end of the array. Then - and here's the magic trick - we use switch to jump into a do loop. It's a perfectly ordinary do loop; the trick is entirely in the fact that if count==6, for instance, then C considers the do-loop to begin at 'case 6:', causing 6 copies of '*to++ = *from++' to be executed before the 'while' returns the loop position to the 'case 6:' point which is where, as far as C is concerned, the do-loop began.

    Thus, the loop is unwound to a level that it can handle.

    I think.

    Feel free to correct/amplify/mock. :)

    cheers,
    Gaurav
  19. Re:wtf? by ChadN · · Score: 4, Informative

    I disagree with your assessment, although you were on the right track; The loop doesn't return back to the case label where the loop was entered, it always jumps back to the 'do' statement (synonomous with the case 0:).

    The way you describe it is that the loop is unrolled to a size that is safely divisible into the 'count' value, which is an interesting idea, but would not be as efficient (large prime number counts would not get unrolled, for example, and a more complex computed got would be required at the loop end).

    My take is this: with loop unrolling, one always has to take care of the 'remainder'. In the above example, the loop is unrolled to be a fixed size (8 repeated copy instructions, instead of one), and any count not divisible by 8 has to handle the remainder of the count after dividing by 8. Conceptually, you could imagine handling this remainder with a separate case section after the unrolled loop. In Duff's device, the remainder is actually dealt with first, by intially jumping into the loop somewhere other than the beginning, then letting the fully unrolled loop finish up.

    In answer to the previous poster's question, the 'do' could (probably) be put on it's own line, before case 0:, but that wouldn't look nearly as bizarre. :)

    Of course, maybe I'm wrong too. I hope not.

    --
    "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward