Protothreads and Other Wicked C Tricks
lwb writes "For those of you interested in interesting hard-core C programming
tricks: Adam Dunkels' protothreads library
implements an unusually lightweight type of threads. Protothreads are
not real threads, but rather something in between an event-driven
state machine and regular threads. But they are implemented in 100%
portable ANSI C and with an interesting but quite unintuitive use of the switch/case
construct. The same trick has previously been used by Simon Tatham to implement
coroutines
in C. The trick was originally invented by Tom Duff and dubbed Duff's
device. You either love it or you hate it!"
I used a Lifeboat lib back in the late 80's that this reminds me of. Cooperative multitasking. Eventually ported the whole thing to OS/2 and used that threading instead. All the code pretyy much worked as-is.
The revolution will NOT be televised.
So this is so "counterintuitive" that no one else will ever understand your code?
Sounds ideal!
Duff on Duff's Device:
http://www.lysator.liu.se/c/duffs-device.html
--- These are not words: wierd, genious, rediculous
I first came across this while I was working on the e-voting machines. There was a dept especially allocated to investigating how to hide certain features in c code to make them look like soemthing else.
Duff's device is a way of forcing C to do a form of loop unrolling. It has nothing to do with coroutines.
This looks very similar to the implementation technique used for the Squeak programming language (not the Smalltalk Squeak). Squeak is a preprocessor for C that makes it very easy to use this technique.
http://citeseer.ist.psu.edu/cardelli85squeak.html
Doug Moen
I have written a truly remarkable program which this sig is too small to contain.
SGI had state threads library since long http://oss.sgi.com/state-threads
And the JVM is written in C :)
LL
...not bound to any particular OS.
0 .x/ggAddTask.3.html
If that's what folks are looking for, another option is the tasks added to LibGG a while back. Tradeoffs either way -- LibGG's requires at least C signals (but will use pthreads or windows threads if detected during compile time), whereas this can be used in OS-less firmware. But on the positive side you can use switch() in LibGG tasks -- what
you can't use are a lot of non-MT-safe system calls. It's an OK abstraction but of course there are so very many ways to accidentally ruin portability that it is far from foolproof.
http://www.ggi-project.org/documentation/libgg/1.
Someone had to do it.
The PPC architecture has a special-purpose count register with specialized branch instructions relating to it; e.g., the assembly mnemonic 'bdnz' means "decrement the count register by one, and branch if it has not reached zero." I've used this in some pretty weird loops, including this one that broke the Codewarrior 9.3 compiler (fixed in 9.4.) This computes the location of the n'th trailing one in a 32-bit integer. Pardon my weak attempt at formatting this in HTML:
static uint32 nth_trailing_one(register uint32 p, register uint32 n) { end: }
return __cntlzw(p ^ (p - 1));
}
The idea was that the instruction stream should stay as linear as possible; most of the time the branches are not taken, and execution falls through to the next line of code. Ironically (siliconically?), the entire function could probably be implemented in a single cycle in silicon; shoehorning bitwise functions like this into standard instructions tends to be extremely wasteful. Perhaps FPGA's will make an end run around this at some point. I've also tried this function with a dynamically-calculated jump at the beginning, similar to the case statement logic in the article.
Hmm, I had a point I was trying to make with this post, but now it's escaped my mind...
Weeks of coding saves hours of planning.
I got to this little gem:
My English parser thread shut down at that point . . .
Seriously, this looks like a handy little thing for low-memory systems, though I'd be a bit hesitant about pushing at the C standard like that--the last thing you need is a little compiler bug eating your program because the compiler writers never thought you'd do crazy things to switch blocks like that.
Weightless threads in Python:
y thrd.html
http://www-128.ibm.com/developerworks/library/l-p
They are cooperative but far more efficient than Python's own threading model. You can easily create hundreds of thousands of concurrent threads.
This is bad, lame, faux cooperative threads.
It's also not even particlarly new [1998].Unless memory is at an absolute premium, just use cooperative threading instead. If you try to use prototheads, you'll quickly discover how unlike "real" programming it is. Even just a 4K stack in your cooperative threads will get you way more than protothreads does.
Ummm, which operating system would that be? Not all programmers have the advantage of an operating system as such; my current development target has no OS, runs at 8MHz, and has 4kbytes of memory. Something like this could be extremely useful for me.
Get the book Obfiscated C and Other Mysteries by Don Libes. Explanations of various Obfuscated C contest entries, and alternate chapters illustrate neat corners of C, including a few things similar to this little library. Occupies a place of honor on my shelf.
PHEM - party like it's 1997-2003!
Even if you are writing in the purest of C, you aren't guaranteed that the optimizer isn't going to very reasonably want to introduce the equivalent of local variables. And even if you are sure there's no optimization going on, you STILL don't know for sure that the compiler isn't using space on the stack. There just is no guarantee built into the language about this. And if you were wrong, you'd get strange, highly intermittent and non-local bugs.
You could be pretty sure. You could force the compiler to use registers as much as possible. You could keep your routines really short. (Hey, if they don't preserve local variables, then how do they do parameter passing?? Parameters are passed on that same stack!)
But to be completely sure, you'd have to look at the output code. It wouldn't be too hard I suppose to write a tool to automatically do it...you'd just look for stack-relative operations and flag them. But then what would you do if something wasn't working? Yell at the compiler? Rewrite the machine language?
I guess I don't quite see the use now I've written this up. When is memory THAT important these days? It ain't like I haven't done this, I've written significant programs that I got paid money to do that fit into 4K (an error correction routine).
But that was an awfully long time ago. Now it's hard to find memory chips below 1Mbit. That two byte number is interesting but your "threads" aren't doing any work for you -- the whole point of threads is that you are preserving some context so that you can go back to them.
And since you can't use local variables, you can't use things like the C libraries or pretty well any library ever written, which is teh sux0r.
For just a few more bytes of memory and a few more cycles, you could save those local variables somewhere and restore 'em later. Suddenly your coding future is a brighter place. Tell the hardware people to give you 128K of RAM, damn the expense!
You could even put in a flag to indicate that that particular routine didn't need its local variables saved so you'd get the best of both worlds, use of external libraries as well as ultra-light switching.
As the prothread homepage says, it's for extremely small embedded systems, where there are no operating systems, with tiny amount of memory (You can't use DRAMs on systems that cost something less than $1). Want to use threads on those kind of systems, you have no choice.
Another advantage is its portability. Small embedded systems, whether they have operating systems or not, usually can't support some fully-blown threading standard. Those operating systems seem to implement some kind of 'specially tuned' thread APIs.
Using these kind of threads on a full-blown PC (or servers) would have almost no benefit. However, in the embedded software engineer's perspective, it's great to see a ultra-lightweight thread library without any platform-dependent code.
It's too clever to be really useful unfortunately. The big issue is of course the no "local variables". Trouble is, if you are writing in C, the compiler may well be creating local variables for you behind your back. In C++ for example there are many cases where this will certainly happen, like
void DoSomething(const string&);
DoSomething("hollow, whirled");
where a local variable of type string will be temporarily created to pass to routine DoSomething.
You need to read the article.
It only says you can't use local variables across functions that block. Actually, it doesn't even say that you can't use them, it only says don't expect their value to be preserved.
In your example, even if the compiler does create a local variable to call DoSomething, and even if DoSomething does block, who cares if the value of that local variable is preserved, since it's impossible to reference it again after that statement?
But that was an awfully long time ago. Now it's hard to find memory chips below 1Mbit.
I can help you with this problem! Is 16 bytes small enough?
And since you can't use local variables, you can't use things like the C libraries or pretty well any library ever written, which is teh sux0r.
But you can use the C libraries. Just don't use local variables across functions that block. Only a very few C library functions block.
Dijkstra is not $DEITY. There is a difference between a competent programmer and a brilliant programmer. Sometimes one has to be clever in order to get the job done.
Actually, since the running of $export DEITY=Dijkstra, he is now.
-- Is "Sig" copyrighted by www.sig.com?
Okay, I'll play the n00b. I understand most of this, but my coding background is not that great, and mostly in C++, Java, and PHP, and I'm having problems with the switch from Duff's Device...
switch (count % 8)
{
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
What the hell is up with that do { applying only in case zero? It's in several places on the net just like that and Visual Studio compiles this just fine, so it's not an error. I checked K&R, and they don't even hint at what could be going on there... I'm lost. Help?
Telltale Games: Bone, Sam and Max
FYI this technique is heavily exploited in the programming language Felix:
.. just a tad faster than Linux. Both MLton and Haskell also support this style of threading with high thread counts and switch rates (although the underlying technology is different).
http://felix.sf.net/
to provide user space threading. The main difference is that all the 'C tricks' are generated automatically by the language translator. If you're using gcc then the switch is replaced by a computed jump (a gcc language extension). On my AMD64/2800 time for creating 500,000 threads and sending each a message is 2 seconds, most of the time probably being consumed by calls to malloc, so the real thread creation and context switch rate is probably greater than Meg/sec order
John Skaller mailto:skaller@users.sf.net
Unless you try to 'yield' something from within your own 'switch' statements. Then such 'smart' macros will silently pollute current 'switch' block with bogus case values, so it:
1) silently modifies you 'switch' statement sematics
2) fails to continue from the right spot on next iteration.