Slashdot Mirror


Protothreads and Other Wicked C Tricks

lwb writes "For those of you interested in interesting hard-core C programming tricks: Adam Dunkels' protothreads library implements an unusually lightweight type of threads. Protothreads are not real threads, but rather something in between an event-driven state machine and regular threads. But they are implemented in 100% portable ANSI C and with an interesting but quite unintuitive use of the switch/case construct. The same trick has previously been used by Simon Tatham to implement coroutines in C. The trick was originally invented by Tom Duff and dubbed Duff's device. You either love it or you hate it!"

229 comments

  1. Looks pretty cool by bobalu · · Score: 4, Interesting

    I used a Lifeboat lib back in the late 80's that this reminds me of. Cooperative multitasking. Eventually ported the whole thing to OS/2 and used that threading instead. All the code pretyy much worked as-is.

    --
    The revolution will NOT be televised.
  2. Job security? by elgee · · Score: 4, Funny

    So this is so "counterintuitive" that no one else will ever understand your code?

    Sounds ideal!

  3. From the source: by MythMoth · · Score: 5, Informative
    --
    --- These are not words: wierd, genious, rediculous
    1. Re:From the source: by HappyCleanerDude · · Score: 1

      Amazing code. Loved the comment: "Disgusting, no? But it compiles and runs just fine. I feel a combination of pride and revulsion at this discovery." that Duff made -- back in '83!

      This is both the strength and weakness of C -- you can do anything you want, and, you can do anything you want!

      It is refreshing to see this in a sea of "development environments."

      --
      --- >
    2. Re:From the source: by Anonymous Coward · · Score: 0

      Hey ive been tricked wheres the hillary duff pron?

    3. Re:From the source: by Bastian · · Score: 4, Funny

      Wow. And I used to think C was frightening when I discovered the fun you can have with a program that takes command-line arguments when you start making recursive calls to main().

      When I saw that code snippet, I found myself switching back and forth between thinking "this is the most beautiful thing I have ever seen" and "dear god, who ordered that monster" so rapidly my brain almost a sploded.

    4. Re:From the source: by astrosmash · · Score: 1

      And for context, here's the original thread in which Duff's post appears:

      Google Groups

      --
      ENDUT! HOCH HECH!
    5. Re:From the source: by stonecypher · · Score: 1

      The standard forbids calling the entrypoint. Calling main() is undefined. Besides, it's silly.

      --
      StoneCypher is Full of BS
    6. Re:From the source: by Anonymous Coward · · Score: 0

      Precisely, back in 1983, since then some more evolution has been going on. What about this:

      template
      class unroll
      {
          enum { index = size - 1 };
          static void copy(xtype* dest, const xtype* src)
          {
                unroll::copy(dest[index],src[index]);
                dest[index] = src[index];
          }
      };

      template
      class unroll
      {
          static void copy(xtype* dest, const xtype* src)
          {
              dest[0] = src[0];
          }
      };

      The difference here is, that we can pass the expression we want to evaluate as unrolled, as a type. Here I hardcoded the operation into static method, but just as easily it could be additional template typename argument, which has a static method which is called, instead of the straight = operator that is called here.

      Just for example's sake, here's how this template is used to unroll the "copy",

      char* dest = ...;
      const char* src = ...;
      unroll::copy(dest,src);

      Ofcourse, this uses compile time constant as count. If want runtime value as count, can write interface which takes modulo and uses this mechanism as the innerloop, if it is template, the type can be determined by the compiler so usage would be like this:

      unroll_blahblah(dest,src,count);

      This way the compile time constants will get the "ultimate" unrolling experience (as good as it gets) and runtime values will also "enjoy" the unrolling nirvana. The only technical difficulty here is to know how much unrolling is desired! Both grounds covered and this gives the extra boost when the constants are known at compilation time, what's up Doc?

      Welcome to the *old* and exciting world of template meta programming. W00t.

      p.s. as a extra note, this strategy allows to make decisions such as:

      *dest++ = *src++;
      *dest++ = *src++; ...

      vs.

      dest[0] = src[0];
      dest[1] = src[1]; ...
      dest += N;
      src += N; .. the later is often much quicker, especially if your target platform doens't have instruction which read-increments and write-increments (example: move.l +(a0), +(a1) vs. mov eax,[esi+0] .. mov [edi+0],eax .. add edi,16 .. add esi,16 ) It quite a lot depends on the hardware architechture and also surrounding code and plenty of other 'factors' what is 'fast' and what isn't.

      At the end of the day, you know, it doesn't really matter most of the time. Most of the time the readability and simplicity of the implementation is prefered attribute, right?

    7. Re:From the source: by bani · · Score: 1

      calling main in C is perfectly legal in both c89 and c99. in C++ it's forbidden.

    8. Re:From the source: by Jeremi · · Score: 1
      in C++ [calling main() is] forbidden


      At the risk of sounding like someone who can't be bothered to RTFCS... why is calling main() forbidden in C++?

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    9. Re:From the source: by gronofer · · Score: 1

      Yeah, for some reason I had always thought of "switch" as a conditional, not a type of goto.

      int main ()
      {
          int test = 11;

          switch (test)
              {
                  if (test == 10)
                      {
                      case 11:
                          test = 12;
                      }
              }

          return test;
      }

    10. Re:From the source: by bani · · Score: 1

      Because C++ compilers can ("implementation-defined") modify your main() to call static constructors. If this happens, your globals get stomped. You are also disallowed in C++ to compute the address of main(), lest you try to call main via a pointer.

      See Section 3.6.1 paragraph 3 of IEC 14882.

  4. Seen this already by Anonymous Coward · · Score: 5, Funny

    I first came across this while I was working on the e-voting machines. There was a dept especially allocated to investigating how to hide certain features in c code to make them look like soemthing else.

  5. Implementation in languages? by Poromenos1 · · Score: 2, Insightful

    I don't program in C that much any more, but it would be nice if this was implemented in higher level languages. I don't know if they would gain anything, but it's at least interesting.

    --
    Send email from the afterlife! Write your e-will at Dead Man's Switch.
    1. Re:Implementation in languages? by Poromenos1 · · Score: 1

      And by that, I mean that the interpreters of languages (Python, Lua, etc) would use this to implement their threads in.

      --
      Send email from the afterlife! Write your e-will at Dead Man's Switch.
    2. Re:Implementation in languages? by Anonymous Coward · · Score: 2, Interesting

      Protothreads are quite limited in what they can do - they are just a way of expressing a state machine using co-routines.

      Real threads are much better for things like interpreters. JIT compilation with native threads is even better.

    3. Re:Implementation in languages? by lux55 · · Score: 1

      I agree. What about PHP for example? Is there some clever trick that could be done like this that would enable a thread-like behaviour, since PHP currently lacks the ability to spawn threads? Processes it does, but not threads, and the two are just different enough to limit PHP's general applicability outside of web apps...

    4. Re:Implementation in languages? by Anonymous Coward · · Score: 0

      Yeah, it's kind of sad that in PHP the best way to spawn threads (conceptually, not per computer-science) in PHP is to open a socket to a page on your own system and close it immediately thus running another page in the background. Actually, it's not even conceptually the same as a thread... damn PHP.

    5. Re:Implementation in languages? by meowsqueak · · Score: 1

      Just curious - in what ways do you think they are more limited? A OS threading mechanism is just a glorified state machine anyway. I agree that these sorts of 'threads' (or 'fibers' as they are known in Windows) run into troubles with blocking I/O, but with async. I/O they are quite useful once you've got some thread management mechanisms incorporated.

    6. Re:Implementation in languages? by BillyBlaze · · Score: 1

      One big practical limitation is that you can't actually take advantage of multiple CPUs. This used to not be very important, but with dual-core CPUs available, it's likely there will be a huge increase in the number of desktop machines with SMP. Granted you could modify these systems to run in two threads, but then you lose the cross-platform abilities which are the main benefit. If you can create 2 threads on all systems you support, why not just create as many as you need? It ends up making more sense to abstract the threading and locking primitives provided by the OSs you run on.

    7. Re:Implementation in languages? by nerdwarrior · · Score: 1

      Try call-with-current-continuation (a.k.a. call/cc) in Scheme or any of the other functional languages. call/cc lets you write co-routines (and many other powerful control constructs) quite easily, and co-routines are much more powerful than this protothread trick. A favorite for Schemers showing off the power of call/cc is the amb macro (which uses call/cc internally). The following code can actually be written in Scheme (after adding the amb and assert macros): (let ((x (amb 1 2 3)) (y (amb 4 5 6))) (begin (assert (= (+ x y) 7) ; once execution reaches here, x and y have values such that x + y = 7. ...))) amb "magically" chooses one of its arguments which causes the computation to eventually succeed. Under the hood, amb uses call/cc to save away the current context and backtrack when a failure occurs.

    8. Re:Implementation in languages? by betterthancats · · Score: 2, Informative

      Well, there is always erlang.

      I'm kind of surprised it hasn't been mentioned yet.

    9. Re:Implementation in languages? by The+boojum · · Score: 1

      Lua at least already has something like this. They're called asymmetric coroutines. Also check out Scheme, Scheme's had continuations forever and some folks have done interesting things with them.

    10. Re:Implementation in languages? by srl100 · · Score: 1

      Ah! There you are - I knew there was at least 1 other person who had heard of it :)

  6. It isn't Duff's device. by mc6809e · · Score: 3, Interesting


    Duff's device is a way of forcing C to do a form of loop unrolling. It has nothing to do with coroutines.

    1. Re:It isn't Duff's device. by Anonymous Coward · · Score: 0

      But it uses the switch-statement in a non-standard way. Nobody claimed that this was Duff's device.

    2. Re:It isn't Duff's device. by LLuthor · · Score: 4, Informative

      Duff's device was the first convoluted form of a switch() statement which became well known.
      All these C "tricks" employ the same technique (though more elegantly) for different goals. Nonetheless, Duff's device can be said to have inspired such code.

      --
      LL
    3. Re:It isn't Duff's device. by Anonymous Coward · · Score: 3, Insightful

      From Duff's 1983 usenet post:

      "Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into."

    4. Re:It isn't Duff's device. by plalonde2 · · Score: 1

      Perhaps "non-standard" but it is standards compliant.

    5. Re:It isn't Duff's device. by eraserewind · · Score: 1

      It can be used to implement coroutines in C. Google for "coroutines in C". any of the first bunch of links talk about it.

    6. Re: It isn't Duff's device. by shalunov · · Score: 4, Informative

      It most certainly is the Duff's device, or at least is very close to it. Duff's device is, indeed, a way to unroll loops; specifically, a way to unroll loops that uses a peculiarity in switch statement syntax that allows case to point inside a loop body. Now, take a look at lc-switch.h in the Protothreads tarball. It contains macros that use the same peculiarity to jump inside functions instead of loops.

    7. Re:It isn't Duff's device. by Anonymous Coward · · Score: 0

      ya and I have a proof for fermat's last theorm but I don't have enough room in the margin to write it, so I won't. k thx gg

    8. Re:It isn't Duff's device. by raytracer · · Score: 2, Interesting
      Tom clarified this on my blog.
      Yeah. I knew this. In my piece about The Device, I mention something about a similar trick for interrupt-driven state machines that is too horrible to go into. This is what I was referring to. I never thought it was an adequate general-purpose coroutine implementation because it's not easy to have multiple simultaneous activations of a coroutine and it's not possible using this method to have coroutines give up control anywhere but in their top-level routine. A simple assembly-language stack-switching library lets you do both of those.
      Still, a pretty cute thing.
    9. Re: It isn't Duff's device. by Anonymous Coward · · Score: 0
      It's duff's device; if you precompile the example-buffer.c file, you'll find this function:
      static
      char driver_thread(struct pt *pt)
      {
        static struct pt pt_producer, pt_consumer;
       
        { char pt_yielded = 1; switch((pt)->lc) { case 0:;
       
        (&empty)->count = 0;
        (&full)->count = 8;
       
        (&pt_producer)->lc = 0;;
        (&pt_consumer)->lc = 0;;
       
        do { (((pt)))->lc = 137; case 137:; if(!(!(((producer(&pt_producer) & consumer
      (&pt_consumer)) == 0)))) { return 0; } } while(0);
       
        }; pt_yielded = 0; do { (pt)->lc = 0;; return 1; } while(0); };
      }
    10. Re:It isn't Duff's device. by m50d · · Score: 1

      Duff mentions in the description of his device "I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into." Could this be Duff's second device?

      --
      I am trolling
    11. Re:It isn't Duff's device. by Punto · · Score: 2, Informative
      The idea of the Duff's device is to use switch to jump to a label depending on the value of a variable (in a portable way). You could use 'goto' with a bunch of conditions, but using switch will make cleaner code (as far as jumping around to arbitrary places in your code goes), and the compiler will probably make a jump table in most cases, which is faster.

      That's what Duff 'discovered', and it's the trick they're using here.

      --

      --
      Stay tuned for some shock and awe coming right up after this messages!

    12. Re:It isn't Duff's device. by Threni · · Score: 1

      > Nonetheless, Duff's device can be said to have inspired such code.

      Duff's device can only be said to have inspired such code if the people who he allegedly inspired say so, or it can be demonstrated that this is true despite their denials. You can't assume it from the code.

    13. Re:It isn't Duff's device. by jrumney · · Score: 1
      The idea of the Duff's device is to use switch to jump to a label depending on the value of a variable (in a portable way).

      That's the idea of a switch statement. Duff's device introduces a loop inside the switch statement but covering multiple cases, to efficiently unroll a loop into larger chunks.

    14. Re:It isn't Duff's device. by jstott · · Score: 1

      Youngsters these days, always reinventing the wheel. This whole thing about using case statements to set up labels for an iterator, it's really just a computed goto--Fortran's had 'em for at least 40 years.

      C ... HTML doesn't like my indentations, but you get the idea.

      C ... Set the initial value
      i = 1

      100 goto (200,300,400,999), i
      STOP 'state variable out of range'

      200 do_stuff1()
      goto 100

      300 do_stuff2()
      goto 100

      400 do_stuff3()
      goto 100

      C ... exit from the loop
      999 continue

      -JS

      --
      Vanity of vanities, all is vanity...
  7. Rob Pike invented this in 1985 by dmoen · · Score: 4, Informative

    This looks very similar to the implementation technique used for the Squeak programming language (not the Smalltalk Squeak). Squeak is a preprocessor for C that makes it very easy to use this technique.

    http://citeseer.ist.psu.edu/cardelli85squeak.html

    Doug Moen

    --
    I have written a truly remarkable program which this sig is too small to contain.
    1. Re:Rob Pike invented this in 1985 by Anonymous Coward · · Score: 2, Informative

      He couldn't have invented it in 1985 since Duff sent his original mail in 1983:

      From research!ucbvax!dagobah!td Sun Nov 13 07:35:46 1983
      Received: by ucbvax.ARPA (4.16/4.13) id AA18997; Sun, 13 Nov 83 07:35:46 pst
      Received: by dagobah.LFL (4.6/4.6b) id AA01034; Thu, 10 Nov 83 17:57:56 PST
      Date: Thu, 10 Nov 83 17:57:56 PST
      From: ucbvax!dagobah!td (Tom Duff)
      Message-Id:
      To: ucbvax!decvax!hcr!rrg, ucbvax!ihnp4!hcr!rrg, ucbvax!research!dmr, ucbvax!research!rob

  8. neat way in C to express an old trick by bani · · Score: 2, Informative

    While it is probably the first time the copy technique had been expressed in C, it's certainly not the first time the actual technique had been expressed in code.

    I recall seeing the same trick implemented in assembler somewhat earlier, I think the technique was called towers?

  9. Wait just a minute ... by Anonymous Coward · · Score: 0, Offtopic

    1) Your operating system in written in Java.
    2) Your email client in written in Java.
    3) Your browser is written in Java.
    4) Your music player is written in Java.
    5) Perl in written in Java.
    6) Your word processer is written in Java.

    So there !!! Java rules !!!

    1. Re:Wait just a minute ... by LLuthor · · Score: 4, Funny

      And the JVM is written in C :)

      --
      LL
    2. Re:Wait just a minute ... by Anonymous Coward · · Score: 0

      my grammar checker is written in c

    3. Re:Wait just a minute ... by InvalidError · · Score: 1

      Actually, since Java has some machine binary compilers, nothing stops people from rewriting most of the JRE in Java.

      Before people could use text editors, someone had to input them as direct machine language... same thing goes even for ASM compilers. The first compiler and runtime environment generation has to be created from existing technologies but nothing (other than motivation or justification) is preventing it from coming full-circle.

    4. Re:Wait just a minute ... by Anonymous Coward · · Score: 0

      If you want your JVM to be fast you'd best keep it in C, not Java. :)

    5. Re:Wait just a minute ... by m50d · · Score: 1
      Actually, since Java has some machine binary compilers, nothing stops people from rewriting most of the JRE in Java.

      Other than Java being a horrible language for low-level programming. There are times when you need the close-to-the-hardware aspect of C.

      --
      I am trolling
    6. Re:Wait just a minute ... by InvalidError · · Score: 1

      How many 'close to the hardware' aspects of C could not be implemented in machine-binary Java?

      Give machine-binary Java inline-ASM and that takes care of the 1% of situations where absolute control is necessary or justifiable. Or just link with external libraries like we already do in C/Pascal/etc.

    7. Re:Wait just a minute ... by m50d · · Score: 1
      How many 'close to the hardware' aspects of C could not be implemented in machine-binary Java?

      It's been a while since I did any Java so I can't remember everything that's wrong with it, but let's see, gotos (they may be an evil but at times they are a necessary one), running without classes where you need the performance, IIRC the arithmetic doesn't reflect the hardware accurately, pointer arithmetic, storing code as data.

      Give machine-binary Java inline-ASM and that takes care of the 1% of situations where absolute control is necessary or justifiable.

      How many good assembly programmers do you know? What about on a more obscure platform? There is too much code that needs c-level control in a modern OS to do it all in asm, believe it or not there are multiple levels of abstraction needed, that's why e.g. linux has c++ parts, c parts, and inline asm. What are you going to do for the c parts in java?

      Or just link with external libraries like we already do in C/Pascal/etc.

      Not good for OS fundamentals, and if it's easy to do in Java why can't you do it already?

      --
      I am trolling
    8. Re:Wait just a minute ... by InvalidError · · Score: 1

      There is such a thing as 'dialects': variations of a language to suit specific needs, like machine-binary, embedded and real-time Javas. Vanilla Java is all bytecode within a contained runtime environment, it cannot have any direct binary interface with the outside world. The other Java dialects can do things that may be technically impossible or simply forbidden by plain Java.

      As for Linux having C++ in it, with the number of times I have seen Linus quoted saying he would not put any C++ in the kernel, it seems unlikely that the kernel would contain actual C++. Since mixing plain C with C++ often causes maintenance nightmares, any C++ code in Linux should be in some isolated sub-module, well outside everything else's way.

    9. Re:Wait just a minute ... by m50d · · Score: 1
      There is such a thing as 'dialects': variations of a language to suit specific needs, like machine-binary, embedded and real-time Javas. Vanilla Java is all bytecode within a contained runtime environment, it cannot have any direct binary interface with the outside world. The other Java dialects can do things that may be technically impossible or simply forbidden by plain Java.

      True, but to make a Java dialect that could do what you need to to program an OS in it you'd basically just be turning it back into C++.

      As for Linux having C++ in it, with the number of times I have seen Linus quoted saying he would not put any C++ in the kernel, it seems unlikely that the kernel would contain actual C++.

      That's not the quotes I've seen at all, they seem more to be on the lines of "don't tell me not to put c++ in the kernel, what is there is there for a good reason". I may be remembering wrong.

      --
      I am trolling
    10. Re:Wait just a minute ... by InvalidError · · Score: 1
      I did a quick check about Linux and C++... it seems like even switching from gcc to g++ front-end caused headaches in the past by triggering incorrect code generation compiler bugs - but those were from the 1992-1996 period, GCC must have improved a little since then.

      There is also a section of the Linux Kernel Mailing List about C++ in the kernel there: http://www.tux.org/lkml/#s15-3

      Finally, while Linus maintains the development kernel, he is the one who makes the final call. In case there are any doubts on what his opinion is, here is what he said in 2004:

      In fact, in Linux we did try C++ once already, back in 1992.

      It sucks. Trust me - writing kernel code in C++ is a BLOODY STUPID IDEA.

      The fact is, C++ compilers are not trustworthy.
  10. Re:This is cool. by idiotdevel · · Score: 0

    not a problem in c# though :)

    just put a nice "unsafe" modifier and start using pointers... on linux via mono... oh yeah

  11. Not new by Anonymous Coward · · Score: 4, Informative

    SGI had state threads library since long http://oss.sgi.com/state-threads

    1. Re:Not new by Anonymous Coward · · Score: 0

      Yep, state threads library is 1000% more useful than this hack. Can be found here: http://state-threads.sourceforge.net/

  12. I guess the idea is it's extremely portable. by skids · · Score: 5, Informative

    ...not bound to any particular OS.

    If that's what folks are looking for, another option is the tasks added to LibGG a while back. Tradeoffs either way -- LibGG's requires at least C signals (but will use pthreads or windows threads if detected during compile time), whereas this can be used in OS-less firmware. But on the positive side you can use switch() in LibGG tasks -- what
    you can't use are a lot of non-MT-safe system calls. It's an OK abstraction but of course there are so very many ways to accidentally ruin portability that it is far from foolproof.

    http://www.ggi-project.org/documentation/libgg/1.0 .x/ggAddTask.3.html

    1. Re:I guess the idea is it's extremely portable. by twiddlingbits · · Score: 5, Insightful

      It is bound to a paticular KIND of OS. This code would not work right in a pre-emptive multi-tasking OS unless it was the highest priority task. It works best without an OS as it makes it's own blocking.

      I read his paper where he said "writing an event-driven system is hard". I guess he has never heard of a using Finite State Automata for the design? State machines are very simple to program. An event driven system is not at all hard to write, although you often times do have to have some deep hardware and/or procesor knowledge to do it well. I wrote many of them in the 1980's when I did embedded C code for DOD work, although I have not done so in quite a few years. Once Ada came along everyone abandoned C as too obtuse for embedded work for the DOD. I once did benchmarks that showed decent C code without strong optimization outperformed Ada code, but C was dead already in their minds. I'm glad to see some folks are still interested in it on the commercial side of programming. After all we can't write everything in Java ;)

    2. Re:I guess the idea is it's extremely portable. by plalonde2 · · Score: 5, Informative
      The challenge is making the design maintainable. There isn't a program that can't be written as a state machine; but most programs expressed this way are difficult to understand and maintain.

      The argument that Rob Pike makes in A Concurrent Window System and with Luca Cardelli in Squeak: a Language for Communicating with Mice is that many of the event systems and associated state machines that we write can be much simplified by treating input multiplexing, and thus coroutine-like structures, as language primitives.

      This work follows directly from Hoare's Communicating Sequential Processes - a good summary can be found here. Working with CSP only a little has convinced me of how much easier so many systems tasks are in this framework than in the world of the massive state-system/event loop world.

    3. Re:I guess the idea is it's extremely portable. by Curate · · Score: 1

      Once Ada came along everyone abandoned C as too obtuse for embedded work for the DOD. Point of clarification: Ada predates C. Your group may have switched to Ada at one point in the 80s, but Ada was very old by then. DoD used Ada for some things over C because they perceived (correctly) that Ada was a safer language. For instance, it is impossible to have a dangling pointer in Ada. It is also impossible for variables to take on values outside of bounds that you specify.

    4. Re:I guess the idea is it's extremely portable. by Doppler00 · · Score: 1

      The challenge is making the design maintainable. There isn't a program that can't be written as a state machine; but most programs expressed this way are difficult to understand and maintain.

      Wow, I know this one from first hand experience. We use a "graphical" programming language called LabVIEW where I work and I was tasked with the maintanance of a software program that was one giant state machine. Let me tell you, it was almost impossible to tell where the program was and where it was going half the time because it passed around a state list like a stack from one state to another and sometimes just deleted the entire list and replaced it with a new order!

      There is rarely a need for a statemachine unless you are trying to optimize something really low level in hardware. I can't really think of a good paradigm to justify it's use in say Java.

    5. Re:I guess the idea is it's extremely portable. by GlassHeart · · Score: 3, Interesting
      There isn't a program that can't be written as a state machine; but most programs expressed this way are difficult to understand and maintain.

      My experiences contradict your statement. State machines are both easy to implement, and easy to debug, if you do it the right way. I have seen many entirely wrong implementations, including one where you can go from any of about two dozen "states" to any other. I have seen some that just switch states when they feel like it, or switch states based on complex decisions, which makes debugging difficult. Put another way, you can make a "state machine" degenerate into something else, and nullify its benefits, if you refuse to follow the rules.

      A well-implemented state machine has an important characteristic: it is clear to see why you are where you are. This means that state transitions are checked (against unexpected events) and traced, so debugging the machine is literally a matter of reading a log that looks like this:

      in state 0, received event A so went to state 2
      in state 2, received event B so went to state 1
      in state 1, received event C so went to state 5
      in state 5, ignoring event A
      in state 5, received unexpected event C

      and so on. In this particular example, the question to answer is why we're not handling event C properly in state 5, or why we went to state 5 in the first place. Either should be pretty obvious when you consult the original design. The fix is likewise obvious. Figuring out state machines, in my experience, has always been easier than figuring out multi-threaded code.

      This isn't to say that all programs should be implemented as a state machine. Simple Unix-style pipe programs, for instance, are generally unsuitable. If you don't know how to design a state machine properly, it's also going to be unsuitable.

    6. Re:I guess the idea is it's extremely portable. by warrax_666 · · Score: 1
      One thing which helps immensely is also having a strong type system with "sum" types, where you can declare types like
      type state =
          State1 with (all_data_pertaining_to_state1)
        | State2 with (all_data_pertaining_to_state2)
        | State3 with (all_data_pertaining_to_state3)
      ...
      and just have one state variable of type state. This will:
      • ensure that you don't accidentally use state information that is not valid in a given state.
      • ensure that all the state transitions in your code explicitly state which information is carried from state to state

      --
      HAND.
    7. Re:I guess the idea is it's extremely portable. by ultranova · · Score: 1

      Let me tell you, it was almost impossible to tell where the program was and where it was going half the time because it passed around a state list like a stack from one state to another and sometimes just deleted the entire list and replaced it with a new order!

      Never ascribe to programming paradigm that which can be adequately explained by programmer stupidity ;).

      There is rarely a need for a statemachine unless you are trying to optimize something really low level in hardware. I can't really think of a good paradigm to justify it's use in say Java.

      Object Oriented Programming ?-)

      I might be wrong here, but it seems to me that objects are nothing but state machines, with methods for changing the state, getting information of the current state, and performing actions whose exact results depend on the current state (for example, when you write to an outputstream, the destination of the data depends on where that outputstream happens to be connectred to).

      In fact, it is impossible to program without messing with state machines, since all Turing machines are state machines.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    8. Re:I guess the idea is it's extremely portable. by thogard · · Score: 2, Informative

      Ada didn't exist before 1977 and C was taking shape in 1972 and in use in 73. Ada's specs were 1st published in June of 79 and the C Programming Language book was published in 78.

    9. Re:I guess the idea is it's extremely portable. by Cwaig · · Score: 3, Informative

      I suspect Adam knows all about FSM's - he was also the original author of the LIWP TCP/IP stack.

      Your point about only working on a particular kind of OS isn't a valid one. Why would it need to be the highest priority native thread?

      I've actually used the Protothread library in implementing the playback code of a PVR - and what it actually provides is explicit scheduling between a set of tasks. For example - playing back an MPEG2 Transport stream requires you to do perform several distinct tasks:
      1) Demultiplex the Transport stream
      2) Feed the MPEG video decoder hardware
      3) Feed the MPEG audio decoder
      ie. 1 producer, 2 consumers.

      You can implement this using normal threads. Or you can cut down on overheads and use protothreads, given that you only have a single instance of the MPEG hardware blocks, and can only play a single TS anyway.

      The system level thread for playback can be thought of as a container for the conceptual Protothreads that schedule cooperatively within the system thread in a producer/consumer type relationship. Kind of like a process/thread separation on a larger OS (the code was running on Nucleus).

      Using protothreads provides a deterministic task swap behaviour that removes the need for any locking primitives on the shared data structures between the producer (in this case the Demux thread) and the consumers (hardware feed threads). You can have a task swap occur based on your own complex conditions (for instance, threshold levels in stream buffers vs time until next frame decode is required), rather than the much more simplistic time slice scheduling or message blocking you'd see in a typical "real" threaded system.

      The priority give to the thread which contains the Protothread scheduled tasks doesn't have to be the highest priority on the system at all. All that priority signifies is how important the actual process of playing the MPEG stream is relative to the other functions going on in the system in parallel - eg. it'd be lower priority than a flash update that was going on in parallel, or any interrupt service threads, or threads that respond to user input. But it'd be more important than the thread that's just doing the nightly scan for new DTT channels in the background.

      I know - I do go on a bit.......

      --
      +++ BASELINE REALITY FAILURE+++ +++ PLEASE REBOOT UNIVERSE +++
    10. Re:I guess the idea is it's extremely portable. by Anonymous Coward · · Score: 0

      The trouble with this idea is that its reliance on static variables within the functions means they are not re-entrant. That is, each function can only be instantiated into one thread/coroutine at a time.

    11. Re:I guess the idea is it's extremely portable. by TwistedSquare · · Score: 2, Informative

      It's worth pointing out that CSP can be implemented by using co-operative multi-tasking that the Protothreads idea is similar to. However at first glance it seems Protothreads are too light-weight to be useful so you must use things like GNU pth, or Microsoft's Fibers. All this can be found in C++CSP, as well as other CSP implementations.

    12. Re:I guess the idea is it's extremely portable. by Surt · · Score: 1

      There isn't a program that can't be written as a state machine;

      This reminded me of the time I had a manager who discovered this in some magazine he was reading, and decided we should rebuild all of our software as state machines. That was the day I finally started looking for my next job.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    13. Re:I guess the idea is it's extremely portable. by Junks+Jerzey · · Score: 1

      One thing which helps immensely is also having a strong type system with "sum" types

      You can do the same thing with C++ classes, where State1 is derived from "state" and so on. Having tried things this way, I much prefer having having super-lightweight, cooperative processes. Manually handling state transitions is just bulky.

    14. Re:I guess the idea is it's extremely portable. by twiddlingbits · · Score: 2, Insightful

      Read the comments in the code. It says if protothreads call other C routines and when in that context the called routine blocks then things don't work right.

      Explicit scheduling is NOT pre-emptive, it's static. It can be priority driven though as is pre-emptive. Pre-emptive is dynamic based on operating conditions such as a user input, interrupt, etc. I've never seen a lot of overhead on task/thread changes in an OS in many years.

      Producer- Consumer tasks have to be very tightly coupled and managed unless you use some type of semaphore (i.e. primitves as you call them) to manage who has control of the resource. Personally, I prefer semaphores versus co-routines, it's a heck of a lot easier to debug. Tightly coupled routines are not considered good Software practices. If you are running this under the priority of a system thread then are you really gaining anything? I'd prefer things be simple and just make it a small thread of it's own instead of a proto-thread.

      Memory these days is cheap, small and fast as are CPUs. Why waste a lot of labor on this "really cool" implementation which IMO will be dam hard to maintain and debug? Now back when I had 2K of RAM for all data AND Stack I had to be concerned, but then hardware was expensive, slow and tricky and my time was cheap :)

    15. Re:I guess the idea is it's extremely portable. by Anonymous Coward · · Score: 0

      "Memory these days is cheap, small and fast as are CPUs. Why waste a lot of labor on this "really cool" implementation which IMO will be dam hard to maintain and debug? Now back when I had 2K of RAM for all data AND Stack I had to be concerned, but then hardware was expensive, slow and tricky and my time was cheap :)"

      In deeply embedded systems, 2k of RAM for data and stack is still very common. This is exactly the kind of environments for which protothreads are intended. It seems that code with protothreads is easier to maintain and debug than complex state machines (which would be the closest alternative - threads would simply be too expensive in terms of memory).

    16. Re:I guess the idea is it's extremely portable. by Anonymous Coward · · Score: 0

      FSM's?? There is only one FSM.
      /RAmen

    17. Re:I guess the idea is it's extremely portable. by ArbitraryConstant · · Score: 1

      "Read the comments in the code. It says if protothreads call other C routines and when in that context the called routine blocks then things don't work right."

      Depends on the goal.

      One of my favorite features of Python is generators. These can be used to implement generators in C. I don't care if it blocks because my goal is not concurrency.

      --
      I rarely criticize things I don't care about.
  13. Re: candidate for -2 moderation by Anonymous Coward · · Score: 0

    Since there's really no place to discuss this type of thing "on topic", I'll suggest it here.

    Can we please have a bayesian filter that automatically starts these things out at say -2 (only possible if you upset the filter)? There are often funny posts that get "bad moderation" and wind up modded -1, so I like to read at -1. However, that unfortunately means I have to see all the GNAA trolls, etc.

  14. Loop Abuse by wildsurf · · Score: 4, Interesting
    Reminded me of a function I once wrote...

    The PPC architecture has a special-purpose count register with specialized branch instructions relating to it; e.g., the assembly mnemonic 'bdnz' means "decrement the count register by one, and branch if it has not reached zero." I've used this in some pretty weird loops, including this one that broke the Codewarrior 9.3 compiler (fixed in 9.4.) This computes the location of the n'th trailing one in a 32-bit integer. Pardon my weak attempt at formatting this in HTML:

    static uint32 nth_trailing_one(register uint32 p, register uint32 n) {
    register uint32 pd;
    asm {
    mtctr n; bdz end
    top: subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdnz top
    end: }

    return __cntlzw(p ^ (p - 1));
    }

    The idea was that the instruction stream should stay as linear as possible; most of the time the branches are not taken, and execution falls through to the next line of code. Ironically (siliconically?), the entire function could probably be implemented in a single cycle in silicon; shoehorning bitwise functions like this into standard instructions tends to be extremely wasteful. Perhaps FPGA's will make an end run around this at some point. I've also tried this function with a dynamically-calculated jump at the beginning, similar to the case statement logic in the article.

    Hmm, I had a point I was trying to make with this post, but now it's escaped my mind... :-)
    --
    Weeks of coding saves hours of planning.
    1. Re:Loop Abuse by proverbialcow · · Score: 1

      Pardon my weak attempt at formatting this in HTML:

      It came out okay, but for future reference, Slashdot does allow a specialized <CODE> tag similar to <PRE>.

      --
      The only surefire protection against Microsoft infections is abstinence. - The Onion
    2. Re:Loop Abuse by slyborg · · Score: 0, Troll
      Hmm, I had a point I was trying to make with this post, but now it's escaped my mind... :-)
      That point being that you will never successfully mate and have offspring :)
    3. Re:Loop Abuse by ameoba · · Score: 1

      ...and a "preview" button on the comment page.

      --
      my sig's at the bottom of the page.
    4. Re:Loop Abuse by addaon · · Score: 2, Interesting

      Unless you know that n is usually large, wouldn't this be more efficiently implemented with cntlzw?

      Adam

      --

      I've had this sig for three days.
    5. Re:Loop Abuse by wildsurf · · Score: 3, Interesting

      Unless you know that n is usually large, wouldn't this be more efficiently implemented with cntlzw?

      It would be if I were looking for the n'th leading one, but this code is looking for the n'th trailing one. (e.g. for 0b0010011001011100, the 3rd trailing one is in the fifth-lowest bit.) The equivalent code sequence for leading ones is in fact more complicated, requiring three arithmetic instructions and a branch per iteration. (cntlzw, shift, xor, branch).

      I actually use this code as part of an algorithm where I have a very large (e.g. 65k-element) packed single-bit histogram array, and need to find the position of (say) the 1000th set bit. Vector instructions can do a coarse population-count from one end fairly efficiently, but once it's narrowed down to a 32-bit region, it comes down to slicing and dicing. My code operates by clearing the rightmost set bit in each iteration (x & (x - 1)), then at the end, isolating the needed bit (x ^ (x - 1)) and using cntlzw to find its position. To clear the leftmost set bit, you need three instructions: first get its position with cntlzw, then shift 0x80000000 right by that number of bits, and finally XOR to clear the bit. (If there's a shorter sequence, I haven't found it.)

      (oh, and for the troll responder-- you are quite spectacularly wrong. But thanks for the giggle.)

      --
      Weeks of coding saves hours of planning.
    6. Re:Loop Abuse by ChrisMaple · · Score: 1

      When doing stuff like this, I've had success with taking 8 bits at a time to index a table which returns the number of bits set to 1. Repeat until the limit is equalled or exceeded; work backwards to get the exact position. Bigger but faster.

      --
      Contribute to civilization: ari.aynrand.org/donate
    7. Re:Loop Abuse by wildsurf · · Score: 1

      When doing stuff like this, I've had success with taking 8 bits at a time to index a table which returns the number of bits set to 1. Repeat until the limit is equalled or exceeded; work backwards to get the exact position. Bigger but faster.

      True, except that the data I'm dealing with tends to be fairly sparse (most bits are zero), and my code steps through one SET bit per iteration, not one logical bit. E.g., finding the third-lowest set bit of 0b0100000100010000 takes 3 steps, not 15. So in practice, stepping through the data 8 bits at a time with table lookup (I've tried) winds up being slower than stepping one set bit at a time in registers; the function rarely takes more than 3-4 steps, and the theoretical worst case of 32 steps (finding the 32nd set bit of 0xFFFFFFFF) is vanishingly unlikely. (Even the backward jump instruction after 8 steps is almost never reached.) Vector logic is fast enough that computing population counts 128 bits at a time is a win, but from there, stepping through set bits seems to be fastest.

      For data where higher percentages of the bits are set, you're right that table lookup might win out. Either way, it's ironic that so many function bottlenecks like this could be implemented _blindingly_ much faster in silicon, on a custom chip. FWIW, the Altivec instruction set seems to be much better designed for clever abuse like this than SSE; I'm in the process of a long and painful switch, and my ported vector code runs about half as fast per clock on x86. Don't get me started... :-/

      --
      Weeks of coding saves hours of planning.
  15. Either ... or ... by Anonymous Coward · · Score: 0

    > You either love it or you hate it!

    Or you have no clue what this post is about ...

  16. Re:Stupid by plalonde2 · · Score: 2, Insightful
    The macros compile out; you get a switch on a piece of state. Do yourself a favour, compile one of the examples and read the code generated. It's screaming efficient compared to a thread context switch.

    Hideous, but efficiency is not it's problem.

  17. It was looking interesting until by achurch · · Score: 4, Interesting

    I got to this little gem:

    The advantage of this approach is that blocking is explicit: the programmer knows exactly which functions that block that which functions the never blocks.

    My English parser thread shut down at that point . . .

    Seriously, this looks like a handy little thing for low-memory systems, though I'd be a bit hesitant about pushing at the C standard like that--the last thing you need is a little compiler bug eating your program because the compiler writers never thought you'd do crazy things to switch blocks like that.

  18. Better for embedded? by Dingo_aus · · Score: 0

    Would this technology be more suited to limited embedded systems? From a casual glance it would seem perfect for systems with specs measured in kilobytes not gigabytes. Especially where operations as close to realtime as possible are highly valued.

    1. Re:Better for embedded? by Anonymous Coward · · Score: 0
      What do you mean "casual glance"? It's in the first line of the site, where they say it's for low memory environments only because otherwise you may as well use read threads.

      I took a cursory look at slashdot.org and found that it was a news site for nerds about stuff that matters to them. Upon subsequent casual glances I discovered menu items, and what looks like news stories, but really it's too early to say.

  19. Stackless Python by alucinor · · Score: 2, Interesting

    Is this similar to Stackless Python and green threads? I spend most of my time with interpreted languages, so my C is very lacking.

    --
    random underscore blankspace at ya know hoo dot comedy.
  20. Re:Stupid by meowsqueak · · Score: 1

    If it's anything like microthreads, then it's not inefficient at all. I have played around with microthreads in python and easily achieved over a million 'concurrently-cooperative' threads.

    Disclaimer: I haven't read the article yet, so maybe it has nothing to do with microthreads.

  21. Slow news day? by spankfish · · Score: 1

    Um, who cares? Really? Shouldn't this be hidden under "Developers"? I mean, I'm a developer, and this might be great to someone who might use it... but that ain't me. (Self-centered huh?)

    --

    NO TOUCH MONKEY!
    1. Re:Slow news day? by Anonymous Coward · · Score: 0

      So why not hide your comment in your journal?

      Why should everyone want to see your opinion in this thread?

      But it's Friday, so I'd rather see lots of irrelevance grow all over ./ today.

      Have a nice day and all.

  22. Python by meowsqueak · · Score: 4, Interesting

    Weightless threads in Python:

    http://www-128.ibm.com/developerworks/library/l-py thrd.html

    They are cooperative but far more efficient than Python's own threading model. You can easily create hundreds of thousands of concurrent threads.

    1. Re:Python by Hast · · Score: 1

      From the desription it seems like Python stackless.

    2. Re:Python by meowsqueak · · Score: 1

      Yes, it's similar in many respects, but does not require the Stackless infrastructure. I'm not saying either way which is better.

    3. Re:Python by mav[LAG] · · Score: 1

      If you like weightless threads, you'll love greenlets (a standalone spinoff module of Stackless). Unlike generator-based coroutines, greenlets can be nested and switched to any other routine without the use of 'yield'. Also, any function can be turned into a greenlet - very little rewriting needed. There's hardly any overhead either since it's just an extension module written in C for the unmodified Python interpreter.

      --
      --- Hot Shot City is particularly good.
  23. extremely limited applicability by nothings · · Score: 5, Informative
    Please note that this isn't interesting unless you work in, as, the FA says, a severely memory constrained system. No normal embedded system needs to do this, much less the systems most programmers on Slashdot probably work with.

    This is bad, lame, faux cooperative threads.

    Local variables are not preserved.

    A protothread runs within a single C function and cannot span over other functions. A protothread may call normal C functions, but cannot block inside a called function.

    It's also not even particlarly new [1998].

    Unless memory is at an absolute premium, just use cooperative threading instead. If you try to use prototheads, you'll quickly discover how unlike "real" programming it is. Even just a 4K stack in your cooperative threads will get you way more than protothreads does.

    1. Re:extremely limited applicability by Anonymous Coward · · Score: 0

      much less the systems most programmers on Slashdot probably work with.

      Yeah, how one could do such a thing in Visual B*sic?

    2. Re:extremely limited applicability by gardyloo · · Score: 2, Funny

      Please note that this isn't interesting unless you work in, as, the FA says, a severely memory constrained system.

            Yeah, but my brain -- Ooh! Shiny!

    3. Re:extremely limited applicability by leuwenburg · · Score: 2, Insightful

      "This is bad, lame, faux cooperative threads." Nobody said they were cooperative threads (re-read the summyary). Protothreads are not threads, they are something in between event-driven state machines and proper threads.

      You may think they are lame, I still think they are cool.

    4. Re:extremely limited applicability by Anonymous Coward · · Score: 1, Insightful

      From TFA "Protothreads are a extremely lightweight, stackless threads".

      That's probably why local variables aren't preserved....

    5. Re:extremely limited applicability by Anonymous Coward · · Score: 1, Interesting

      This is bad, lame, faux cooperative threads.

              Local variables are not preserved.

              A protothread runs within a single C function and cannot span over other functions. A protothread may call normal C functions, but cannot block inside a called function.


      Well,
      I think it it is a simple idea but very well implemented and especially useful for microcontroller programs. Mr. Dunkels other project is uIP, which I do find very useful in this area, too.
      I.e. this is neat if you don't have the resources at hand to run a even a small multitasking kernel! Maybe 2kB flash for program storage or similar.

      As one of my hobby projects, I'm implementing a software DCC encoder (a protocol used for digital model railroads) that has to talk concurrently to the PC and to another bus with various connected gadgets. It has to do buffering and status displays and whatnot. Going the traditional way of
      a chain of 'if's is hard and an explicit state-machine non-obvious. Protothreads work, are easy and small (few bytes).

      Cheers,

      Onno

    6. Re:extremely limited applicability by gedhrel · · Score: 1

      Yup. Did something similar back in '91 - was inspired by lpmud, didn't like its threading model (calls ran to completion), invented something different to produce a timeslicing VM with interruptable operations. Very similar idea, but the call protocol passed the state to the operation as a parameter (the new state was returned as the result).

      typedef int STATE;

      STATE some_op(STATE current, /* other cruft */) {
          switch (current) {
          case ...: /* something */
              return next_state;
      }

      It was actually a pretty speedy interpreter for the time. Great learning tool too - found out all about race conditions the hard way :-)

  24. This won't help me... by winphreak · · Score: 1

    I already crashed Linux with my code, let alone implement something like this.

    --
    "I'm a well-wisher, in that I don't wish you any specific harm."
  25. Not very handy by Anonymous Coward · · Score: 0

    Don't need a low overhead pseudo-thread implementation for what I do, but would be nice is
    a stack-safe and stable(same thing) setjmp() for process oriented posix compliant stuff.
    Anyone got an implementation?

  26. ...sane detexi hanc marginis exiguitas non caperet by abb3w · · Score: 2, Interesting
    From the summary: Protothreads are not real threads, but rather something in between an event-driven state machine and regular threads.

    From the above Duff on Duff's Device: I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.

    Perhaps this is the Duff's Device equivalent of a proof of Fermat's Last Theorem? Or is my ignorance of the history of Evil Computing showing?

    --
    //Information does not want to be free; it wants to breed.
  27. Re:Stupid by mvdw · · Score: 3, Insightful

    Ummm, which operating system would that be? Not all programmers have the advantage of an operating system as such; my current development target has no OS, runs at 8MHz, and has 4kbytes of memory. Something like this could be extremely useful for me.

  28. Recursive main() by David+Gould · · Score: 1


      Wow. And I used to think C was frightening when I discovered the fun you can have with a program that takes command-line arguments when you start making recursive calls to main().

    And just what's wrong with that?

    --
    David Gould
    main(i){putchar(340056100>>(i-1)*5&31|!!(i<6)<< 6)&&main(++i);}
    1. Re:Recursive main() by sinserve · · Score: 1

      Learn Scheme or learn computing using Scheme. Or learn to appreciate art and still get something done.

      Nirvana.

  29. Python's way ahead of ya by xant · · Score: 2, Informative

    Actually, I don't know if this is exactly the same feature, but it sounds like it can be used that way. Check out the What's new in Python entry. This is currently implemented in Python CVS, to be available in Python 2.5.

    The actual Python Enhancement Proposal gives more detail and several badass use-cases.

    --
    It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
    1. Re:Python's way ahead of ya by Anonymous Coward · · Score: 0

      YUCK! Why do the Python folks insist on crapifying and half-baking concepts like this all the time?
      Are they allergic to *generality*? They should just implement continuations and be done with it.

      def counter (maximum):
          i = 0
          while i < maximum:
              val = (yield i)
              # If value provided, change counter
              if val is not None:
                  i = val
              else:
                  i += 1

    2. Re:Python's way ahead of ya by Courageous · · Score: 1

      They should just implement continuations and be done with it.

      It's been tried. There were problem with 3rd party C libraries. It's not like you can hand-wavey, hand-wavey, and suddenly C itself is stackless. That's what Christian Tismer ran up against, why the PEP to add continuations to Python was ultimately rejected.

      C//

  30. You want cool C stuff... by Dr.+Manhattan · · Score: 4, Interesting

    Get the book Obfiscated C and Other Mysteries by Don Libes. Explanations of various Obfuscated C contest entries, and alternate chapters illustrate neat corners of C, including a few things similar to this little library. Occupies a place of honor on my shelf.

    --
    PHEM - party like it's 1997-2003!
    1. Re:You want cool C stuff... by Albigg · · Score: 1

      Looks like a great book from reading the reviews on Amazon.com. I loved "Expert C Programming" (the "ugly fish book"). After some looking around it appears this book is hard to find, it is out of print. :( Anyone seen it around anywhere?

  31. Dijkstra says... by Jeff85 · · Score: 2, Interesting

    "The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague." -Edsgar Dijkstra

    --
    Fetch Text URL - Firefox Extension
    1. Re:Dijkstra says... by menkhaura · · Score: 1

      Dijkstra is not $DEITY. There is a difference between a competent programmer and a brilliant programmer. Sometimes one has to be clever in order to get the job done.

      --
      Stupidity is an equal opportunity striker.
      Fellow slashdotter Bill Dog
    2. Re:Dijkstra says... by mikeage · · Score: 3, Funny

      Dijkstra is not $DEITY. There is a difference between a competent programmer and a brilliant programmer. Sometimes one has to be clever in order to get the job done.

      Actually, since the running of $export DEITY=Dijkstra, he is now.

      --
      -- Is "Sig" copyrighted by www.sig.com?
    3. Re:Dijkstra says... by hunterx11 · · Score: 1

      Even for a C programmer, the sort of person who would abuse a language this way probably believes in Laziness, Impatience, and Hubris.

      --
      English is easier said than done.
    4. Re:Dijkstra says... by Anonymous Coward · · Score: 0

      Maybe not, but along with Knuth he's up there at the pinnacle of our profession.

  32. sobleq??? by yonyonson · · Score: 1
    The VAX C compiler compiles the loop into 2 instructions (a movw and a sobleq, I think.) As it turns out, this loop was the bottleneck in a real-time animation playback program which ran too slowly by about 50%. The standard way to get more speed out of something like this is to unwind the loop a few times, decreasing the number of sobleqs.
    What exactly is a sobleq? i'm assuming it's a VAX assembler statement, but i've never seen it before and was wondering if any of you knew how it worked.

    Thanks
    1. Re:sobleq??? by plalonde2 · · Score: 1

      Subtract One and Branch if Less than or EQual

    2. Re:sobleq??? by Bruce+Perens · · Score: 1
      They gave microcode writers a lot of work back then.

      Bruce

    3. Re:sobleq??? by arodland · · Score: 1

      It's a handy one; in fact, as I understand it it's possible to build a universal computer (as powerful as a turing machine) having only two operations: "increment" and "decrement and test for zero". x86 provides this functionality in the form of an opcode called simply "loop" because it's so handy for counting loops.

    4. Re:sobleq??? by Anonymous Coward · · Score: 0

      It's not universal without some sort of loop or jump or branch.

    5. Re:sobleq??? by fbjon · · Score: 1
      Here's a much handier assembler statement:

      Jump And Verify Accumulator If Shift By Eight Totals Three Empty Registers

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    6. Re:sobleq??? by arodland · · Score: 1

      Java is a pile of Gosling's steaming shit.

  33. a fun trick only useful in very specialized cases. by TomRitchford · · Score: 3, Insightful
    It's too clever to be really useful unfortunately. The big issue is of course the no "local variables". Trouble is, if you are writing in C, the compiler may well be creating local variables for you behind your back. In C++ for example there are many cases where this will certainly happen, like
    void DoSomething(const string&);
    DoSomething("hollow, whirled");
    where a local variable of type string will be temporarily created to pass to routine DoSomething.

    Even if you are writing in the purest of C, you aren't guaranteed that the optimizer isn't going to very reasonably want to introduce the equivalent of local variables. And even if you are sure there's no optimization going on, you STILL don't know for sure that the compiler isn't using space on the stack. There just is no guarantee built into the language about this. And if you were wrong, you'd get strange, highly intermittent and non-local bugs.

    You could be pretty sure. You could force the compiler to use registers as much as possible. You could keep your routines really short. (Hey, if they don't preserve local variables, then how do they do parameter passing?? Parameters are passed on that same stack!)

    But to be completely sure, you'd have to look at the output code. It wouldn't be too hard I suppose to write a tool to automatically do it...you'd just look for stack-relative operations and flag them. But then what would you do if something wasn't working? Yell at the compiler? Rewrite the machine language?


    I guess I don't quite see the use now I've written this up. When is memory THAT important these days? It ain't like I haven't done this, I've written significant programs that I got paid money to do that fit into 4K (an error correction routine).

    But that was an awfully long time ago. Now it's hard to find memory chips below 1Mbit. That two byte number is interesting but your "threads" aren't doing any work for you -- the whole point of threads is that you are preserving some context so that you can go back to them.

    And since you can't use local variables, you can't use things like the C libraries or pretty well any library ever written, which is teh sux0r.


    For just a few more bytes of memory and a few more cycles, you could save those local variables somewhere and restore 'em later. Suddenly your coding future is a brighter place. Tell the hardware people to give you 128K of RAM, damn the expense!

    You could even put in a flag to indicate that that particular routine didn't need its local variables saved so you'd get the best of both worlds, use of external libraries as well as ultra-light switching.

  34. Re:a fun trick only useful in very specialized cas by plalonde2 · · Score: 2, Informative
    The absence of local variables is because the stack is not preserved between invocations, and multiple invocations give the appearance of scheduled threads. There is nothing here that the compiler can break with its optimizer, unless it is breaking valid C code: the control flow expressed using the macros is completely legit C control flow. It's worth running the examples through just the C pre-processor in order to understand what gets built.

    It's ugly as sin, but your compiler had better get it right, or else it's not a C compiler.

  35. This is mostly useless by Dwedit · · Score: 0

    Normally if you want multithreading and have no OS to do it for you, you'd use an interrupt handler to swap all registers, as well as the stack location. Even a 400 byte stack is adequate if you have no recursion or local arrays. So really, this is only for situations where you don't even have 400 bytes to spare per thread. The complete inability to use local variables makes this useless except in very specific situations.
    I could see this being used if you need a very large number of threads with no local variables. I'd still be very wary about using something that does not preserve the content of registers when switching 'threads'.

    1. Re:This is mostly useless by bani · · Score: 1

      read: useful for constrained (eg, embedded) systems.

  36. Yes, can be useful (depending on platform) by ihavnoid · · Score: 3, Interesting

    As the prothread homepage says, it's for extremely small embedded systems, where there are no operating systems, with tiny amount of memory (You can't use DRAMs on systems that cost something less than $1). Want to use threads on those kind of systems, you have no choice.

    Another advantage is its portability. Small embedded systems, whether they have operating systems or not, usually can't support some fully-blown threading standard. Those operating systems seem to implement some kind of 'specially tuned' thread APIs.

    Using these kind of threads on a full-blown PC (or servers) would have almost no benefit. However, in the embedded software engineer's perspective, it's great to see a ultra-lightweight thread library without any platform-dependent code.

  37. Re: candidate for -2 moderation by larry+bagina · · Score: 1
    it will probably never happen, however, you could probably write a greasemonkey script (you are using firefox, right? :) to hide GNAA posts, posts by TripMasterMonkey, etc.

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

  38. Re:a fun trick only useful in very specialized cas by PsychicX · · Score: 2

    As far as ridiculously low memory constraints go...
    I was talking to a friend the other day, who had to write the code for a car door opener dealie. You know the one. A really nice, high end one with an LCD that displayed stuff (not your average 100% hardware door opener). His code had a staggering 256 bytes of RAM to work with, and even then they were potentially 7 bytes overbooked. So yes, these kinds of constraints still exist. Sadly.

  39. you misread by toby · · Score: 1

    The summary only claims that the 'interesting but quite unintuitive use of the switch/case construct' is originally Duff's device, not application to coroutines.

    --
    you had me at #!
  40. Or... by Anonymous Coward · · Score: 0

    You either love it or you hate it!

    Or you don't care, having long since given up on low-level languages so you can actually be *productive*...

  41. Oh my God! by Snowdrake · · Score: 1

    You Slashdotted PuTTY! You bastards!

  42. Re:a fun trick only useful in very specialized cas by dmadole · · Score: 4, Informative

    It's too clever to be really useful unfortunately. The big issue is of course the no "local variables". Trouble is, if you are writing in C, the compiler may well be creating local variables for you behind your back. In C++ for example there are many cases where this will certainly happen, like

    void DoSomething(const string&);
    DoSomething("hollow, whirled");

    where a local variable of type string will be temporarily created to pass to routine DoSomething.

    You need to read the article.

    It only says you can't use local variables across functions that block. Actually, it doesn't even say that you can't use them, it only says don't expect their value to be preserved.

    In your example, even if the compiler does create a local variable to call DoSomething, and even if DoSomething does block, who cares if the value of that local variable is preserved, since it's impossible to reference it again after that statement?

    But that was an awfully long time ago. Now it's hard to find memory chips below 1Mbit.

    I can help you with this problem! Is 16 bytes small enough?

    And since you can't use local variables, you can't use things like the C libraries or pretty well any library ever written, which is teh sux0r.

    But you can use the C libraries. Just don't use local variables across functions that block. Only a very few C library functions block.

  43. Re:a fun trick only useful in very specialized cas by TomRitchford · · Score: 2, Interesting

    The absence of local variables is because the stack is not preserved between invocations, and multiple invocations give the appearance of scheduled threads. There is nothing here that the compiler can break with its optimizer, unless it is breaking valid C code: the control flow expressed using the macros is completely legit C control flow. It's worth running the examples through just the C pre-processor in order to understand what gets built.


    I downloaded it. But the version that is there does not, in fact, compile: there's an obvious typo at example-buffer.c, line 137:
    PT_WAIT_THREAD(pt, producer(&pt_producer) &
                      consumer(p&t_consumer));
    // should be consumer(&pt_consumer));
    That aside, I believe your claim is wrong now that I've read the code.
    int x = 23;
    // Some blocking code here.
    // A case statement leads us to here.
    if ( x != 23 ) {
    // Now x could be anything.
    }
    right? The reason is that you are simply case-statementing into the code and bypassing the whole subroutine calling mechanism entirely -- so the variable initialization just isn't called.

    Now consider the following code:
    double x = sin(y*(2*pi)/360)*cos(y*(2*pi)/360);
    // Some blocking code here.
    // A case statement gets us to here.
    double z = tan(y*(2*pi)/360;
    Suppose the compiler decided to extract out the common y*(2*pi)/360 term as an optimization. The assignment to that common term will not occur when you jump into the middle of that routine -- so your code will not work as it appears.
  44. Recursive main() by Anonymous Coward · · Score: 0

    sweet jesus. i remeber learning recursion in college (not too long ago). now i find out main() can be recursive? will the madness never end?? God i need to code again... too much tech support...

  45. misread by l0rdpestilence · · Score: 0

    I read it as pothead, post a smily if you did to.
    Just as a question: Why do people thing threading is extra hard to do? without much exp so far it seems to help in that the code snippets are smaller and theirfor less prone to typos (then again only ones I've used to this point are in java with a IDE)

    1. Re:misread by Anonymous Coward · · Score: 0


      =)
  46. Re:a fun trick only useful in very specialized cas by S.+Traaken · · Score: 2, Informative
    Suppose the compiler decided to extract out the common y*(2*pi)/360 term as an optimization.

    But the compiler won't decide to do this because it won't be able to establish that y (or pi) can not be changed between instances of this code.
  47. Re:a fun trick only useful in very specialized cas by plalonde2 · · Score: 2, Informative
    But that's precisely the reason you musn't use locals: they are not preserved. The switch in the "thread" setup sets up a new scope in which you might declare locals, but that scope need not be entered cleanly, leading to (in the case of your example) an uninitialized local variable.

    In your second example, the compiler *cannot* remove the sub-expression because the case statement that gets you there crosses a basic block boundary; the return statement from the blocking code, and the jump in through the switch are caught as valid C constructs that prevent the common sub-expression optimization.

    The macros as given give you a semblance of variable continuity and scoping, but the compiler, after the macro substitutions, just sees a return on the end of the blocking section, and a switch into the code following, something like:

    char YourFunc(char foo) {
    switch (foo) {
    case 'A': /*your blocking code*/
    return 'B';
    case 'B': /* the stuff after your blocking code */
    break;
    }
    }
    The ugliness in the macros is about letting other control constructs live around your switch in, but still generates legit code, but, for example, you can't use a local varible as an index to your for loop. Yuck.
  48. The Ego Says.... by Anonymous Coward · · Score: 0

    Debugging is always harder than coding, so any person that makes code as clever as he possibly can is by definition not fit to debug it. (not my quote and cant find reference hence AC)

    Brilliant programmers know this, its the Egotistical ones that think they have to be clever, the job is never done at coding.

    1. Re:The Ego Says.... by Anonymous Coward · · Score: 0

      Thats a retarded blanket statement, just like D's statement is.

      Will the code have to be modularised, worked on, and debugged by 100 different people? Is the code part of a vast, giant enteprise level application, or is it perhaps just an effective way to do something with limited resources?

      You can't apply software engineering dogma to everything. Common sense tells us to keep it simple AND brilliant in the right balance, for the right situation. Sometimes what is needed will demand more intelligent and qualified programmers than your average coding project.

      With that said, there is nothing fucking hard about protothreads. If you are going to call every smart programmer an egotist then maybe you need to examine your own intellectual faculties in relation to your job.

  49. lightweight threads = fibers? by dmccarty · · Score: 1

    Is thre some similarity between these protothreads and fibers?

    --
    Have fun: Join D.N.A. (National Dyslexics Association)
  50. wtf? by DeafByBeheading · · Score: 3, Interesting

    Okay, I'll play the n00b. I understand most of this, but my coding background is not that great, and mostly in C++, Java, and PHP, and I'm having problems with the switch from Duff's Device...


      switch (count % 8)
      {
      case 0: do { *to = *from++;
      case 7: *to = *from++;
      case 6: *to = *from++;
      case 5: *to = *from++;
      case 4: *to = *from++;
      case 3: *to = *from++;
      case 2: *to = *from++;
      case 1: *to = *from++;
            } while (--n > 0);
      }


    What the hell is up with that do { applying only in case zero? It's in several places on the net just like that and Visual Studio compiles this just fine, so it's not an error. I checked K&R, and they don't even hint at what could be going on there... I'm lost. Help?

    --
    Telltale Games: Bone, Sam and Max
    1. Re:wtf? by Anonymous Coward · · Score: 0

      Just think of the switch as a computed "goto", with each case being a label. case 0 goes to the beginning of the loop, and the rest of the cases go into the middle.

    2. Re:wtf? by chgros · · Score: 1

      Think of each "case" as a goto label and the "switch" as an indexed goto.

    3. Re:wtf? by Anonymous Coward · · Score: 0

      You are looking at it back to front or inside out.

      The do ... while is just a regular loop do do all the (identical) statements however many times.

      The switch is the strange part, and is controlling how many statements there are inside the loop because depending on the values, you want to unroll the loop to a different extent.

      Note that all this is unnecessary unless you want to be efficient on this particular architecture. (that's what loop unrolling is about after all). You can do all the same functionality with a do .. while loop with just one statement in side it, and no strange switch controlling where to enter it.

    4. Re:wtf? by Anonymous Coward · · Score: 0

      I doubt this will help as its puzzling to me as well, but maybe..

      its a loop, the sub will start with case 0, and not return to case 0. it will loop though case 7-1 until --n is less than 0, at which it would end.

    5. Re:wtf? by Anonymous Coward · · Score: 0

      The do actually applies everywhere. For the purposes of evaluating what happens, move the do { outside of the first case statement, so that all case statements are enclosed within the body of the do while loop. This will make following what happens easier.

      Let's say you want to unroll a loop safely so that you perform 8 steps for every iteration. Only you can't guarantee that your counter is a multiple of 8. What do you do?

      First, figure out the remainder of your counter when divided by 8, so you can align the rest of your operations.

      counter % 8 = remainder

      We'll do remainder individual steps and then do the other iterations 8-steps at once.

      Well the switch statement allows the first iteration to jump to the remainder number of steps within this block of 8. After that each iteration of the loop starts at the beginning.

    6. Re:wtf? by Anonymous Coward · · Score: 0

      You're thinking of the switch statement the wrong way. Notice that each case statement looks a bit like a label, as in the target of a goto statement. The switch statement evaluates the (count % 8) expression, and performs a goto to the appropriate case. This also explains the "fall through" behavior. A break statement is just a goto that jumps to the end of the switch statement.

      All the "do {" statement does is mark the beginning of the loop; in other words, it's just another label.

      Duff's device works by using the switch statement to perform a computed goto into the middle of the loop. The statement "while (--n > 0)" performs a jump back to the top of the loop (it doesn't care that this happens to "be inside case 0") unless the expression is false, in which case execution continues after the end of the code snippet.

      C is known as a low level programming language, because in many ways it works like an advanced assembly macro language; it can be helpful to keep this in mind while studying clever tricks in C.

    7. Re:wtf? by ggvaidya · · Score: 5, Informative
      Okay, I'll try and see if I can figure this thing out (you have to admit, it screws with your mind just looking at it ...):

      You can implement a simple memcpy function like this:
      void copy(char *from, char *to, int count) {
        do {
            *from++ = *to++;
            count--;
        } while(count > 0);
      }
      So far, so good. Now Duff's problem was that this was too slow for his needs. He wanted to do loop unrolling, where each iteration in the loop does more operations, so that the entire loop has to iterate less. This means the 'is count > 0? if so, go back, otherwise go on' part of the loop has to execute fewer times.

      Now, the obvious problem with this is that you don't know how much you can unwind this particular loop. If it has 2 elements, you can't unwind it to three elements, for instance.

      This is where Duff's Device turns up:
      int n = (count + 7) / 8; /* count > 0 assumed */
       
        switch (count % 8)
        {
        case 0: do { *to = *from++;
        case 7: *to++ = *from++;
        case 6: *to++ = *from++;
        case 5: *to++ = *from++;
        case 4: *to++ = *from++;
        case 3: *to++ = *from++;
        case 2: *to++ = *from++;
        case 1: *to++ = *from++;
              } while (--n > 0);
        }
      First, we check to see how much we can unroll the loop - for instance, if count is perfectly divisible by 5, but not 6, 7, or 8, in which case we can safely have 5 copies inside our loop without worry that the copy is going to move past the end of the array. Then - and here's the magic trick - we use switch to jump into a do loop. It's a perfectly ordinary do loop; the trick is entirely in the fact that if count==6, for instance, then C considers the do-loop to begin at 'case 6:', causing 6 copies of '*to++ = *from++' to be executed before the 'while' returns the loop position to the 'case 6:' point which is where, as far as C is concerned, the do-loop began.

      Thus, the loop is unwound to a level that it can handle.

      I think.

      Feel free to correct/amplify/mock. :)

      cheers,
      Gaurav
    8. Re:wtf? by Blackbird_Highway · · Score: 1

      Remember that in a C switch, the cases are not exclusive, each of the cases will execute (or at least be evaluated), one by one, unless you put in break(s), which will cause execution to jump back out of the switch once it finds a case that evaluates true.

      So yes, this is a legal construct, by the rules of C. It's just a bit bizarre to get a grip on in terms of the flow of program execution. It's one of those things that C critics point to as why no one should ever use C, but I think it's one of the languages quircky charms.

      I recall a test question in C class that asked if you should always place a break in each and every case statement. My answer of no was graded incorrect because if you don't then execution will fall through to the next case, often causing all sorts of unexpected behaviour. But here is clearly an example of taking advantage of the fall through feature.

      --
      By the perception of illusion, we experience reality
    9. Re:wtf? by bitraid · · Score: 1

      I'd rather use a
      while () {}
      loop, since it correctly handles count == 0

    10. Re:wtf? by Anonymous Coward · · Score: 0

      The 'do' in C doesnt do anything really. In fact i dont know about any language that would do 'do' (hehe). It just acts as a jump label for the corresponding 'while'.

    11. Re:wtf? by DeafByBeheading · · Score: 1

      Thanks, that *sort* of clears things up. I mean, I know what a switch statement does, I know about falling through cases, and I know about do loops. I was just confused about the opening of the do loop seemingly only applying to case 0... I mean, from what I know about switches, if let's say count = 1,

      the switch(count % 8) takes you to case 1, so you do the *to = *from assignment, and then the next line is what confused me--you're either coming to the test condition for a loop you never started, or hitting some other weirdness... I guess the answer is that if you use switch to jump into a loop, the loop is still initialized? Does this have to do with how this translates to machine code? A do loop is just a conditional jump backwards, right? Would this still work with a loop where the test condition is at the beginning, so all you reach at the end is the end of the scope opened earlier (if the compiler hadn't started misbehaving due to some plugin issues, I'd check it out myself)? God, would an if be possible to mangle like this?

      switch(x)

      case 1: if (y)
      case 2: --x;


      Heh...

      --
      Telltale Games: Bone, Sam and Max
    12. Re:wtf? by Brane · · Score: 3, Informative

      [...]the 'while' returns the loop position to the 'case 6:' point which is where, as far as C is concerned, the do-loop began.

      No, it returns to the 'case 0:' point where the 'do {' is. (Otherwise the loop wouldn't be executed count times, and somehow I think this Duff guy would have thought of that...)

    13. Re:wtf? by amchugh · · Score: 1

      Bump up parent, he's correct about 'case 0:'

    14. Re:wtf? by ChadN · · Score: 4, Informative

      I disagree with your assessment, although you were on the right track; The loop doesn't return back to the case label where the loop was entered, it always jumps back to the 'do' statement (synonomous with the case 0:).

      The way you describe it is that the loop is unrolled to a size that is safely divisible into the 'count' value, which is an interesting idea, but would not be as efficient (large prime number counts would not get unrolled, for example, and a more complex computed got would be required at the loop end).

      My take is this: with loop unrolling, one always has to take care of the 'remainder'. In the above example, the loop is unrolled to be a fixed size (8 repeated copy instructions, instead of one), and any count not divisible by 8 has to handle the remainder of the count after dividing by 8. Conceptually, you could imagine handling this remainder with a separate case section after the unrolled loop. In Duff's device, the remainder is actually dealt with first, by intially jumping into the loop somewhere other than the beginning, then letting the fully unrolled loop finish up.

      In answer to the previous poster's question, the 'do' could (probably) be put on it's own line, before case 0:, but that wouldn't look nearly as bizarre. :)

      Of course, maybe I'm wrong too. I hope not.

      --
      "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
    15. Re:wtf? by gandalf013 · · Score: 2

      Hmm. Doesn't make full sense to me this way.

      First part of your explanation is OK - let's say we need to unroll a loop for efficiency. And we decide to unroll it so it has 8 statements inside the loop. So, assuming count>0, we would do something like this (using dots to show indentation because "ecode" eats up leading spaces):

      void copy(char *from, char *to, int count) {
      ..int n = count / 8;
      ..int i;

      ..if (n > 0) {
      ....do {
      ......*from++ = *to++;
      ....../* 7 more times (code omitted to avoid slashdot lameness filter) */
      ....} while(--n > 0);
      ..}
      ../* Handle the "leftovers" */
      ..for (i=0; i<n%8; ++i) {
      ....*from++ = *to++;
      ..}
      }

      That's a standard loop-unrolling for you. Duff's Device just shortens it. How? The switch statement jumps into the loop to a point where we first handle the "leftovers", and then we execute the loop (count/8) times, each time copying 8 elements, thus copying (count/8)*8 elements in the loop (which, because of integer division is not equal to count unless count%8==0, that's why the leftovers).

      Thus, the part where you talk about count being excatly divisible by 5, 6, 7, etc., is wrong. The switch statement only checks the remainder of count when divided by 8.

      This may be wrong too. In that case, please feel free to correct me.

    16. Re:wtf? by phantomfive · · Score: 1

      Yeah, your code sample there works. Basically, the case statement is just a label, and doesn't affect the code. It just tells the switch where to jump to. If the test condition is at the beginning, it will just jump past the condition without executing it. For loops would never be initialized. Thinking of this in assembly is a good way to understand how this stuff happens.

      --
      Qxe4
    17. Re:wtf? by Tarwn · · Score: 3, Informative

      So in a shorter description, what happens is that:
      1) you determine how many groups of 8 you will need, rounding up to count the remainder block as well (if there is one)
      2) code enters switch statement based on the remainder value, hits the correct case and falls through (note that if there was no remainder we start at the top of the cases and fall through, consuming an entire 8 block)
      3) code hits the while, decrements the number of 8 blocks (as we just finished off the partial "remainder block")
      4) return to do, fall through to finish this 8 group
      5) loop back to 3

      Took me a few minutes of staring at it (and I admit, some tme looking at above descriptions) to get over 4 years of no C in my diet, but now I have to admit that is beautiful.

      --
      Whee signature.
    18. Re:wtf? by mevets · · Score: 1

      You transcribed it wrong, the case 0 should be:
      case 0: do { *to++ = *from++;

    19. Re:wtf? by Jeremy+Singer · · Score: 2, Interesting

      I think the other replies to this are good, but beg the question of why to do this. Tricks like this might save, at most, microseconds of execution time, but they will inevitably waste hours of human time, even when they haven't broken. The right attitude of a programmer when using such a trick in a program is to consider such tricks as a kind of lethal bomb just waiting to blow up at an inconvenient time. Even if you are a brilliant programmer and can guarantee that nobody except yourself will ever look at or touch this code again, there will come a time when the intermediate value theorem of programming will get you. The theorem goes like this: No matter how brilliant you are at time n, there is always a time n + m such that you are at a different level of brilliance. Murphy's corollary says that this level of brilliance will be less than the level at point n at least half the time, and that time n will be just before its time to go on vacation, go home to your significant other, etc. I know, Dijkstra said it better.

    20. Re:wtf? by cyclomedia · · Score: 1

      my two cents:

      do
      {
            x
      }
      while(y)

        is essentially equivalent to:

      label:
      {
            x
      }
      if(y)
            goto label;

      so it doesnt matter where you enter the loop it will always start at the beginning

      --
      If you don't risk failure you don't risk success.
    21. Re:wtf? by multipartmixed · · Score: 1

      Your understanding of C switch statements is backwards.

      The first case expression which evaluates as true is treated as a label to jump into the code block. The code then executes until a break is encountered, or the switch statement terminates. Think about how it compiles.

      > I recall a test question in C class that asked if you should always place
      > a break in each and every case statement. My answer of no was graded
      > incorrect

      You were right. Your teacher was wrong. WRONG WRONG WRONG WRONG!

      Consider this code, which reflects a message parser I once wrote:

      switch (message_type)
      {
          mt_smail_spool_file:
              consume_all_smail_headers();
          mt_rfc822:
              start_parsing();
              break;
          default:
              err("invalid message type");
      }

      The code for start_parsing was ~20 lines. This statement most precisely reflects what actually needs to happen -- let alone being clear, consise, short, and effecient. consume_all_smail_headers() was just a 3-line loop which ate up the extra smail headers in the input stream.

      --

      Do daemons dream of electric sleep()?
    22. Re:wtf? by multipartmixed · · Score: 1

      > but beg the question of why to do this.
      > Tricks like this might save, at most, microseconds
      > of execution time,

      Duff used it for exactly the right reason: to feed a bit-blitter fast enough to be useful. Without the loop unrolling, it was 50% too slow on his machine.

      I suppose he could have waited for a faster CPU to come along, or written in assembly, but I think his use of his device was exactly correct.

      Subsequent uses are primarily useful as Koans on the road to C Enlightenment.

      --

      Do daemons dream of electric sleep()?
    23. Re:wtf? by Jeremi · · Score: 1
      Sure, the code is legal, but your failure to put a /* fall-through -- I deliberately didn't put a break statement here! */ comment at the end of your first case is practically criminal negligence.


      What do you think is going to happen the first time a newbie coder looks at your code? He's going to say "aha! A missing break statement! A bug!", and he'll "fix it" by adding one, and break your code :^P

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    24. Re:wtf? by DeafByBeheading · · Score: 1
      --
      Telltale Games: Bone, Sam and Max
    25. Re:wtf? by cyborg_zx · · Score: 1

      guess the answer is that if you use switch to jump into a loop, the loop is still initialized?

      Is there any intialization for this loop?

      Does this have to do with how this translates to machine code? A do loop is just a conditional jump backwards, right?

      Yes. 'do' doesn't actually do anything - it is merely a marker that specifies the start of a block of a do-while construct. There's no instruction to be created for it.

      Would this still work with a loop where the test condition is at the beginning, so all you reach at the end is the end of the scope opened earlier

      Well a while { } loop should still function correctly - unless an optimization meant that it was assumed a register contained the value of some object used in the conditional and hence performed an operation on a register without the correct contents (a likely scenario so it's possible that particular bit of dataflow peculiarity may have been accounted for in most compilers) - since the jump back to test for the condition will work no matter where you enter the loop. A for loop would probably cause trouble though if you used an initilization, otherwise it'd be equivalent to the while loop.

      The basic thing to remember here is that case is just a special label for a point in code. Think about the code that would be generated - remember that C is not an interpreted language.

    26. Re:wtf? by BovineSpirit · · Score: 1

      I don't think you're quite right. When it loops it loops over all the {*to++ = *from++}s so that the first time it only does it count % 8 times, if n>1 then it goes back and does it 8 times until n = 0.

    27. Re:wtf? by narcc · · Score: 1

      The "programmer" in your example is the product of what happens when the "Computer Science I" classroom magically turns into "Intro to Programming" sometime between the first and second class. ("Intro to Programming" magically turns into "Making Forms with VB6", to prevent redundancy) The result: A B.S. in CSci that is exactly that -- B.S.

  51. Re:Stupid by moro_666 · · Score: 1

    indeed, this code actually rocks from one point of view, the non-context-switching point of view :)

    people are all so excited currently about kernel thread and pthreads and stuff like that, but what they dont realize is that it's terrible overhead actually makes your app slower than it would be if it was built on a single process model. this "threading" api is a quite wicked implementation, but it works in simple cases and since it doesn't switch the registers back and forth all the time like your 250 pthreaded app, it's fast. on the other hand, it's fairly incompatible with signals and stuff (i think) and i have no idea what this poor library does when it meets regular blocking i/o code. most probably it will just block.

    anyway, the desktop machines and servers today dont really care that much about which threading api they use, but in stuff like cellphones, palm handhelds and other quite slow operating devices that dont have a common threading interface, this is a quite good solution.

    it's a good thing alright, just try to think out of the box and look where it can be used (i will go and multithread my microwave now ...)

    --

    I'd tell you the chances of this story being a dupe, but you wouldn't like it.
  52. Use of this technique in Felix by skaller · · Score: 4, Interesting

    FYI this technique is heavily exploited in the programming language Felix:

    http://felix.sf.net/

    to provide user space threading. The main difference is that all the 'C tricks' are generated automatically by the language translator. If you're using gcc then the switch is replaced by a computed jump (a gcc language extension). On my AMD64/2800 time for creating 500,000 threads and sending each a message is 2 seconds, most of the time probably being consumed by calls to malloc, so the real thread creation and context switch rate is probably greater than Meg/sec order .. just a tad faster than Linux. Both MLton and Haskell also support this style of threading with high thread counts and switch rates (although the underlying technology is different).

    --
    John Skaller mailto:skaller@users.sf.net
  53. wtf? indead! by n6gps · · Score: 1

    I'm lost too. What is this supposed to do? Copy from "from" to "to"? Then why is the "to" pointer never being incremented? Is this another Microsoft interview question? Did I get the job?

    1. Re:wtf? indead! by Sloppy · · Score: 1
      I think the in original Duff example (surely someone will jump on me if I'm wrong) "to" pointed at a memory-mapped I/O register; everytime you wrote to the location pointed at by "to", something happened.

      To copy memory, yes, you'd increment "to".

      And yes, I think this kind of code is horrific and I wouldn't do it. I don't care if it's faster.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  54. His original post that he says he lost: by Anonymous Coward · · Score: 0

    http://groups.google.com/group/net.lang.c/browse_t hread/thread/dca16afbaec3cdd4/66008138e07aa94c?lnk =st&q=%22tom+Duff%22&rnum=2&hl=en#66008138e07aa94c
    I think that's it. note the reference to the mention of the interrupt driven state machine that's too horrible to describe

  55. Re:Stupid by crawly · · Score: 1

    my current development target has no OS, runs at 8MHz, and has 4kbytes of memory

    Sounds like damn luxury to me. My current project has 2kbytes ram, 4.9MHz clock. And they need it to do math(add, multiply, divide) on 30 odd 64-bit variables.

    Young people these days just don't know how good they've got it.

    --
    GCS/S d-x s+(+): a C++++$ UL+$ P+ L++$ !E--- W++@ N++>$ !o !K-- w++$ !O !M !V PS++>$ PE !Y PGP+ t+ 5++ X++ R tv b
  56. cool and all... by YesIAmAScript · · Score: 1

    But it became largely unnecessary when optimizing compilers became available when RISC came about. A good compiler will unroll a simple loop like this smarter than you can.

    Besides, on any modern machine you'll get more speed up by copying a word at a time instead of tightening your byte-at-a-time loop. Because due to the caches, you'll likely end up moving cache lines at a time over the bus anyway (assuming stuff is aligned), and so the key becomes to minimize the number of instructions needed to do it. And going to 32-bit copies cuts that down to 1/4.

    All this stuff might be unintuitive to some, but frankly, case statements are just gotos with optional exits (breaks), so once you know that, this doesn't seem all that odd.

    --
    http://lkml.org/lkml/2005/8/20/95
  57. Lambda Calculus? by WillerZ · · Score: 1

    Ever heard of Lambda Calculus?

    --
    I guess today is a passable day to die.
    1. Re:Lambda Calculus? by Bozdune · · Score: 1

      Yes, and its un-clean implementations in stack-based Lisps with property lists and funarg problems? Right. Show me a purist and I'll show you complete lack of practical application. Where's the 1978 Jack Dennis dream of converting Fortran programs into stateless fine-grained parallel code? Oh right, didn't happen. I remember now.

  58. Re:what a stoner sees by Anonymous Coward · · Score: 0

    Funnily enough, that was exactly what I thought it said, and I hadta go skin up a d00b. Aren't hands really interesting things? I'm hungry. Oh wow, these noodles taste divine! By the time I got back to my workstation, someone else had already posted it.

  59. Abstraction by halleluja · · Score: 1

    Any modern high-level should have ... oh wait, never mind.

  60. Please stop now by Anonymous Coward · · Score: 0

    Can people please stop show some consideration and stop slashdotting Chiark please? I want to read my e-mail.

    Thank you.

  61. Re:Stupid by dillkvast · · Score: 1

    anyway, the desktop machines and servers today dont really care that much about which threading api they use, but in stuff like cellphones, palm handhelds and other quite slow operating devices that dont have a common threading interface, this is a quite good solution.

    While your argument concerning cellphones certainly has been valid historically, todays devices are actually very capable. I work with cellphone software and I found myself beeing suprised e.g by the performance of J2ME on modern phones. My experience is with symbian (c/c++) and J2ME. Symbian OS encourages use of cooperative multitasking though its "active objects"-pattern to eliminate some of the overhead of real threads. However using J2ME yields almost the same performance without having to deal with all the "performance optimizing" tricks of Symbian C++ (such as active objects, descriptors instead of strings and manual memory (de)allocation). (It is possible that this is due to java-optimized hardware in the phones, we have tested new Sony Ericssons and high-end Nokias, but who cares it works very well)

    When high level languages/platforms like Java can successfully be used even on cellphones, its is time for C++ to die as language for application developement. The only obstacle, as I see it, is that the two promising platforms (Java and .NET) are properitary and subject to market power struggles. So we are sadly probably stuck with C++ hell for years to come yet.

    --
    Scitne aliquis remedium potimum crapulae?
  62. RE:calling main() by Anonymous Coward · · Score: 0

    > The standard forbids calling the entrypoint. Calling main() is undefined.

    In your head it's undefined. In the C standard there's no such limitation.

    > Besides, it's silly.

    Probably.

  63. Re:...sane detexi hanc marginis exiguitas non cape by armb · · Score: 1

    If you follow the link to Simon Tatham's coroutines, there's a link to Duff confirming that he was referring to the same thing.
    http://brainwagon.org/archives/2005/03/05/1060/#co mments

    --
    rant
  64. Duffs device by sl4shd0rk · · Score: 1

    "The device is indeed perfectly valid, legal C, and many compilers will optimize the switch into a jump table just as would be done in an assembler implementation."

    This is another example of trying to make and optimizer out of a compiler. "many compilers" does not mean "all compilers" which in turn is going to bite you in the ass at some point.

    --
    Join the Slashcott! Feb 10 thru Feb 17!
    1. Re:Duffs device by multipartmixed · · Score: 1

      The device works regardless of compiler implementation -- but it will be faster if it is used with a jump-table.

      That said -- the jump-table optimization is SO TRIVIAL that I can't see any arbitrary compiler *not* using it. In fact, it is easier to write the switch directly as a jump table than it is to extract any other semantic meaning out of it, IMHO.

      (Of course -- not all compiler writers think in lex and yacc like I do)

      --

      Do daemons dream of electric sleep()?
  65. Should work quite fine by zde · · Score: 3, Insightful

    Unless you try to 'yield' something from within your own 'switch' statements. Then such 'smart' macros will silently pollute current 'switch' block with bogus case values, so it:

    1) silently modifies you 'switch' statement sematics
    2) fails to continue from the right spot on next iteration.

  66. Re:Stupid by TheRaven64 · · Score: 1
    people are all so excited currently about kernel thread and pthreads and stuff like that, but what they dont realize is that it's terrible overhead actually makes your app slower than it would be if it was built on a single process model.

    You seem to be working under the impression that your target machine only has 1 CPU (and that CPU only has one core). Using 2 CPUs with a 20% overhead from thread locking etc. is a lot faster than using one without the overhead. Most new CPUs have two cores. In 18 months, they will have 4, then 8 etc. They will also have SMT support. If you want to future-proof your code (and it's at all performance-dependent), then make sure it exhibits a high degree of parallelism.

    Of course, if you want your code to be at all maintainable, then avoid threads like the plague, and pick a language like Erlang that has asynchronous message passing primitives.

    --
    I am TheRaven on Soylent News
  67. Re:Stupid by TheRaven64 · · Score: 1
    Out of completely idle curiosity, what kind of arithmetic hardware do you have? Presumably in something that small you aren't working in a high-level language (or are you?) Do you have 64-bit registers, or are you limited to 32/16/8-bit manipulations? Do you even have a multiply instruction, let along a divide, or do you have to do it all with shifts? What kind of speed requirements do you have?

    Anyway, it sounds like fun - although possibly less so if you have some kind of deadline.

    --
    I am TheRaven on Soylent News
  68. this is an OS! by Anonymous Coward · · Score: 0

    well, this IS the OS. A simple embedded OS typically consists of these parts:
    - mem manager (malloc etc)
    - threads
    - some sort of locking mechanism

    Nice to have
    - some sort of queues, message systems
    Luxurious:
    - file system for flash or ram

    everything else is already "luxury", such as printf facility, network stack, ...

    So - those simple threads are the OS. A very small one with lots of restrictions. But it is so nice to have such functionality instead of having nothing.

  69. Re:Stupid by moro_666 · · Score: 1

    i agree that having 2 threads on a 2way machine is quite a good idea ... and that having 16 threads on a 4way 4core processor machine is sometimes a good idea. but people are threading everything today ... even stuff that doesnt need to be threaded :'(.

    if you count all the threads that run in an usual linux/apache/mysql box that in addition has a threaded mailserver attached to it (and a some lightweight java applications runnnin on tomcat in java threads), then the poor machine is just overkilled. even if it's a 4-way box. if your mysql spawns itself into 50 threads to handle 50 clients, than on a machine that can handle 16 processes at the same time (thanks to its 4x4 cores) is still in big trouble, and the locking/sharing& registry pulling/pushing overhead is indeed a plague.

    parallel computing is a ofcourse the way of the future, but people should try to remember that there are other ways than threads to do parallell stuff, and if you leave aside the hyper/mega/giga hype about common threading tricks, then sometimes forking can just perform better, and in some cases, you have to go pvm or mpi right from the start to avoid massive code rewriting later. (usual threads over an 100node beowulf is fun but... not as effective as you'd think, btw. is there anything besides openssi that can pull that off ?)

    when i first saw threads in year 2000 in java, i was amazed, they felt so good and easy to handle, it was rather hard to make a mistake there, and it all worked fine. i never thought about performance. during the last year i have fought with the performance issues a lot. and that "yberthreading" disease seems to spread even more with every day that passes.

    ofcourse we can't say that threads are "all bad", actually they are really good in some spots, and in certain conditions they help you to keep a clean sane code and are the best solution. but people please, just dont overdo it :)

    --

    I'd tell you the chances of this story being a dupe, but you wouldn't like it.
  70. This might help by amightywind · · Score: 1

    I am working to optimize some medical device analog IO code that runs on a PIC processor with 191 bytes of memory. This may help. Real ideas like these are a lot more fun than the flamebait political stuff that has predominated on this forum for the past few years.

    --
    an ill wind that blows no good
  71. Re:Saving microseconds of exec time by Anonymous Coward · · Score: 0

    Because when you have a 20 MHz CPU that has to do 500,000 loops per second, it *matters* whether those loops take 35 cycles or 45 cycles.

    The 2 things that separate the assholes from the stars are: Knowing when it is worth spending hours to optimize, and Documenting what they did and why.

  72. Or in the case of PHB by Analogy+Man · · Score: 1
    ...and dubbed Duff's device. You either love it or you hate it!

    "Surplus that Corba and refresh them with some Duff's devices. I think we ought to get the multi-core ones to cut down on licensing costs."

    --
    When the people fear their government, there is tyranny; when the government fears the people, there is liberty.
  73. Re:a fun trick only useful in very specialized cas by SirLestat · · Score: 1

    Most people think that we have almost infinite processing power and memory everywhere. The truth is that most application do not need that much. Like the car door openner mentionned in post above, there are microchips on many small gizmos. As soon as it needs to be small, low power and low cost there is a pretty good chance it uses a 8051 microcontroller in it. That means 12mhz with 256 bytes internal ram. Actually the ram contain special registers, the stack, bit addressable section, so you don't really have all that 256 bytes for anything you would want. 8051 are ideal for those small applications and as strange as it might seems now, you can still do a lot with that amount of ram if you don't play stupid.

  74. Re:Stupid by Anonymous Coward · · Score: 0

    efficiency is not it's problem

    "its".

  75. Re:a fun trick only useful in very specialized cas by Anonymous Coward · · Score: 0

    The largest Atmel AVR microcontrollers have 8k of SRAM. PIC microcontrollers are, as far as I know them, even more limited. A more typical RAM size for them is around 128Bytes (not kB!). There is just no way to get 128k of RAM easily with those devices. That's what is meant by "severely memory restricted".

  76. Something fishy about the first example - by Anonymous Coward · · Score: 0

    Something doesn't look quite right in the first example http://www.sics.se/~adam/pt/examples.html>

    Consider the sender -

        1) an acknowledgement is successfully received (the timer has not expired)
        2) the timer expires
        3) the timer is checked again in the condition of the while loop
        4) the packet is resent even though it has been correctly acknowledged

  77. hello! anybody heard of setjmp/longjmp?!? by Anonymous Coward · · Score: 0

    In C, you can use setjmp/longjmp for simple cooperative threads. You do not even need to setup a stack (but then you get no local variables).

    Instead of 2 bytes, you need enough bytes to hold the machine state (sizeof(jmp_buf)). It's a little more overhead but a hell of a lot easier to follow than some macro'd switch statements. :)

  78. Seems like C# is nicer here by massimiliano · · Score: 1

    Is it just me, or this looks like a low level implementation of what you can easily do with the C# "yeld" statement?

  79. Re:This is cool. by massimiliano · · Score: 1

    Actually, wouldn't you use "yeld" in C#?

  80. Re:calling main() by Analog+Squirrel · · Score: 1
    > > Besides, it's silly.

    > Probably.

    That's the point, isn't it? ;-)

    --
    I'd rather be flying
  81. Ia! Ia! __cntlzw fhtagn! by boomgopher · · Score: 1

    Ia! Ia! __cntlzw... oh wait - *ahem* sorry..

    --
    Your hybrid is not saving the environment. Its purpose is to make you feel good about buying something.
  82. 133t, d0 j00 5p34k 33t, muth4ph0ck3r?!!1one1!! by Thud457 · · Score: 1
    --

    the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

  83. r: This might help by francisew · · Score: 1

    I know what you mean. I've also used PIC's a lot (although I prefer the ubicom/SX chips for their speed). This is a great technique for handling concurrent processes in a menatally manageable way.

    This is very similar to the technique used to multitask in the interrupts. To time-share the interrupts it's common to use a similar jump-table structure.(so that you can have serial comm and pwm and the like all happening at once with proper predictable timing)

    Then again, might as well go with a more reasonable and pleasant microcontroller, like the MSP430.

    (I also program uC's for medical device related things. I'm just finishing up my Ph.D. in analytical chemistry... working on some photon time-of-flight spectrometers for in-vivo measurements.)

    I agree as to the usefulness of this story. I think there should be more like this.

    1. Re:r: This might help by amightywind · · Score: 1

      Interesting post. I have a PIC 16c63 that spends 90% of its time doing analog input and needs to use the other 10% percent wisely. I can't just use an ISR and a background loop because I have three tasks. Multiple ISR's have been problematic. I am working with the hardware guys to drop in a better faster PIC. My problems would probably be gone. I like the PIC environent pretty well. The register set is pretty simple and straightforward. Cheers.

      --
      an ill wind that blows no good
  84. Re:Stupid by ckaminski · · Score: 1

    The general rule of threading is this:

        If an operation can block, and prevent system responsiveness, then a thread is a viable means of working around the issue.

    With that in mind, worker pools of 50 threads in apache make sense, since each thread can iterate [select()] amongst 1-n sockets, and the case of 1 socket blocking abnormally will not render the entire webserver immobile.

    In a GUI application, when a user executes a long-running task, like "generate a report", you'd trigger it in a thread so that your message pump can get back to work keeping the window drawing primitives going.

    Other uses of threads, such as compute tasks, are less wise, IMHO, but sometimes they have their places.

  85. It's very handy for microcontrollers by John+Jorsett · · Score: 1
    Please note that this isn't interesting unless you work in, as, the FA says, a severely memory constrained system.

    Which is exactly why I ended up using protothreads for a PIC microcontroller based system that needed some concurrency but didn't have the resources to support a more full-featured OS. With protothreads, I was able to input serial data from a couple of devices, drive an LCD display, take user input, etc. For all but the simplest implementations of microcontroller-based systems, protothreads are a terrific help.

  86. From wikipedia by McGiraf · · Score: 1

    From wikipedia on the page linked from the story:

    This article has recently been linked to from Slashdot. Please watch out for any trolls that may target this article

    Well, well, this crowd has a good reputation!

  87. /. suks!!! by cheesy9999 · · Score: 1

    Ah Wikipedia...some prick thought he was funny and added little ironic comment to the Duff's Device page: This article has recently been linked to from Slashdot. Please watch out for any trolls that may target this article. /. suks!!! From IP address: 139.168.1.230 Enjoy.

    --
    -tom
    1. Re:/. suks!!! by cheesy9999 · · Score: 1

      Damn HTML formatting...

      Ah Wikipedia...some prick thought he was funny and added little ironic comment to the Duff's Device page:

      This article has recently been linked to from Slashdot. Please watch out for any trolls that may target this article.

      /. suks!!!


      IP address: 139.168.1.230

      Enjoy.

      --
      -tom
  88. What about Hilary Duff's Device? by Anonymous Coward · · Score: 0

    I'd click on that.

  89. Re:Stupid by mvdw · · Score: 1

    I have addition & subtraction, (integer) multiplication and division, but most of the "hard work" is done with an FPGA. The application doesn't have much need for arithmetic anyway, so there's no issue with speed. It works well, and yes, it is fun.