Correct, except 2nd-gen stack machines have a dedicated stack to hold those return addresses, so they never get to memory. Makes for very fast calls and returns. Experiments by Prof. Koopman have shown that for all practical purposes, a return address stack of 16 elements is deep enough. There is such a 16-deep stack (hidden from the programmer) on the Pentium 4 (and the Alpha AXP too I think): http://blogs.msdn.com/oldnewthing/archive/2004/12/ 16/317157.aspx
"Out of order processing has been done in stack machines for years now."
As far as I know, there are no implemented second-gen stack computers that support that feature. (There have been a few theoretical ones.) Which ones are you talking about?
I agree about the advantage of out-of-order execution, but what about thread-level parallelism? Given multiple smaller, simpler processors on-chip, if one stalls on a memory fetch, the others may still be crunching away on stack and on cache. This is not unlike the approach Sun is taking with the UltraSparc T1 (aka "Niagara").
Admitedly, this will not parallelize single-threaded code. But it's a lot easier to design.:)
The problem of buried values on a stack is dealt with by factoring the program really finely. A procedure should ideally use no more than 2-3 elements on a stack. More than that and the code gets very hard to follow and needed data items get buried, as you mention.
Second-gen stack computers are basically a hardware implementation of the Forth virtual machine, so Forth code maps pretty much directly to such a machine.
Correct, except 2nd-gen stack machines have a dedicated stack to hold those return addresses, so they never get to memory. Makes for very fast calls and returns. Experiments by Prof. Koopman have shown that for all practical purposes, a return address stack of 16 elements is deep enough. There is such a 16-deep stack (hidden from the programmer) on the Pentium 4 (and the Alpha AXP too I think): http://blogs.msdn.com/oldnewthing/archive/2004/12/ 16/317157.aspx
My reply to the main themes in the comments are here: http://funos.livejournal.com/367820.html
"Out of order processing has been done in stack machines for years now."
As far as I know, there are no implemented second-gen stack computers that support that feature.
(There have been a few theoretical ones.)
Which ones are you talking about?
I agree about the advantage of out-of-order execution, but what about thread-level parallelism?
:)
Given multiple smaller, simpler processors on-chip, if one stalls on a memory fetch, the others may still be crunching away on stack and on cache. This is not unlike the approach Sun is taking with the UltraSparc T1 (aka "Niagara").
Admitedly, this will not parallelize single-threaded code. But it's a lot easier to design.
Actually, the ones I refer to are the RTX-2000 and RTX-2010.
See http://forth.gsfc.nasa.gov/ for examples
The problem of buried values on a stack is dealt with by factoring the program really finely. A procedure should ideally use no more than 2-3 elements on a stack. More than that and the code gets very hard to follow and needed data items get buried, as you mention.
Second-gen stack computers are basically a hardware implementation of the Forth virtual machine, so Forth code maps pretty much directly to such a machine.
Both methodologies work. Using stacks is better in the small, when software and hardware size are the limiting factors.
Cute. :)
Then you either fill from memory, or you check for it at compile time.
Thank you!