Next Generation Stack Computing
mymanfryday writes "It seems that stack computers might be the next big thing. Expert Eric
Laforest talks
about stack computers and why they are better than register-based
computers. Apparently NASA uses stack computers in some of their probes. He
also claims that a kernel would only be a few kilobytes large! I wonder if
Windows will be supported on a stack computer in the future?"
Sounds like fun. I remember writing in RISC assembly code. I fealt like a real man back then. C# just doesn't cut it.
fyi - The Open Office format Slide links don't work, so sadly I had to open the PPT file in Open Office instead.
I didn't know either:
http://en.wikipedia.org/wiki/Stack_machines
/* TBD */
I remember that FORTH is a language support STACK COMPUTING. Hopefully, it is not totally wrong. Unfortunately, it is really hard to understand FORTH program.
http://en.wikipedia.org/wiki/Forth_programming_lan guage
^(oo)^pig~
The Java Virtual Machine is also sort of stack-based. The JVM bytecode set uses stack operations but the safety checks that it runs make it equivalent to a sliding set of registers not unlike, say, the SPARC architecture. A JIT implementation could do away with the stack, at least in the individual method calls, though the call-return stack would still have to be there.
Introduction
Discovered field by chance in 2000 (blame the Internet)
Hobby project (simulations and assembly) until 2004
Transformed into Independent Study thesis project
Overview of current state of research
Focus on programmer's view
Stack Computers: Origins
First conceived in 1957 by Charles Hamblin at the University of New South Wales, Sydney.
Derived from Jan Lukasiewicz's Polish Notation.
Implemented as the GEORGE (General Order Generator) autocode system for the DEUCE computer.
First hardware implementation of LIFO stack in 1963: English Electric Company's KDF9 computer.
Stack Computers: Origins (Part 2)
Independently discovered in 1958 by Robert S. Barton (US).
Implemented in the Burroughs B5000 (also in 1963).
Better known
Spawned a whole family of stack computers
The First Generation
The First Generation: Features
Multiple independent stacks in main memory
Stacks are randomly accessible data structures
Contained procedure activation records
Evaluated expressions in Reverse Polish Notation
Complex instructions sets trying to directly implement high-level languages (e.g.: PL/1, FORTRAN, ALGOL)
Few hardware buffers (four or less typically)
Supplanted in the 1980's by RISC and better compilers
Stack Computers: A New Hope
Enter Charles H. ("Chuck") Moore:
Creator of the stack-based FORTH language, circa 1970
Left Forth, Inc. in 1981 to pursue hardware implementations
NOVIX (1986), Sh-BOOM (1991), MuP21 (1994), F21 (1998), X18 (2001)
Currently CTO of Intelasys, still working on hardware
product launch expected April 3, 2006 at Microprocessor Summit
Enter Prof. Philip Koopman, Carnegie-Mellon University
Documented salient stack designs in "Stack Computers: The New Wave", 1989
The Second Generation
The Second Generation: Features
Two or more stacks separate from main memory
Stacks are not addressable data structures
Expression evaluation and return addresses kept separate
Simple instruction sets tailored for stack operations
Still around, but low-profile (RTX-2010 in NASA probes)
Strangely, missed by virtually all mainstream literature
Exception: Feldman & Retter's "Computer Architecture", 1993
Arguments and Defense
Taken from Hennessy & Patterson's "Computer Architecture: A Quantitative Approach", 2nd edition
Summary: Valid for First Generation, but not Second
Argument: Variables
More importantly, registers can be used to hold variables. When variables are allocated to registers, the memory traffic reduces, the program speeds up (since registers are faster than memory), and the code density improves (since a register can be named with fewer bits than a memory location).
[H&P, 2nd ed, pg 71]
Manipulating the stack creates no memory traffic
Stacks can be faster than registers since no addressing is required
Lack of register addressing improves code density even more (no operands)
Globals and constants are kept in main memory, or cached on stack for short sequences of related computations
Ultimately no different than a register machine
Argument: Expression Evaluation
Second, registers are easier for a compiler to use and can be used more effectively than other forms of internal storage. For example, on a register machine the expression (A*B)-(C*D)-(E*F) may be evaluated by doing the multiplications in any order, which may be more efficient due to the location of the operands or because of pipelining concerns (see Chapter 3). But on a stack machine the expression must be evaluated left to right, unless special operations or swaps of stack position are done.
[H&P, 2nd ed, pg. 71]
Less pipelining is required to keep a stack machine busy
Location of operands is always the stack: no WAR, WAW dependencies
However: always a RAW dependency between instructions
Infix can be easily compiled to postfix
Dijkstra's "shunting yard" algorithm
Stack swap operations equivalent to register-register move operations
S
Java bytecode is interpreted on a virtual stack based processor. Most bytecode gets JITed into native register based instructions, but the model JVM processor is a stack processor.
Some previous poster noted that CLI is also a stack based model. I can't verify that myself but it wouldn't surprise me; Microsoft is, after all, highly 'innovative' or something.
Lurking at the bottom of the gravity well, getting old
Dear Slashdot Contributors,
Please stop describing undergrads doing independent studies as "Experts". Theres a reason that mainstream processors haven't picked up on "Stack Processors", and it has nothing to do with binary compatibility, the difficulty of writing a compiler for their instruction set, or general programming complexity. Stack Machines are really only good for In-Order processing. Wonder why NASA probes have Stack Processors? Because they don't freaking need to do out of order processing in order to get the performance they require, and they probably found stack processors to have a favorable power / performance ratio for their application. You will never see a full blown Windows running on a Stack processor, because Superscalar processors destroy their performance.
"My research project shows that some people wrote nifty papers in the 1970s, but everyone ignored them for an obvious reason I don't understand." -> Not an Expert
Once upon a time when laser printers cost $10K, before CorelDraw, I learned to program Postscript. Yes it's a real programming language but it prefers to do things with stacks. You could write real programs that did real calculations etc. What a pain. I'd rather program in machine code. I'd rather program in microcode. aargh. You get the point.
...
I still use postscript to create graphics but if any computation is involved, I use another language.
Having said the above I realize that most languages insulate you from the architecture. If you're programming in Python or Java, you probably wouldn't notice the difference. My experience with Postscript convinces me that such an architecture isn't as efficient as some people think it is though. There's just too much popping and pushing just to get at a value. Since a given value changes its position on the stack, you also need to keep track of where it is. Bleah. This isn't to say that stacks don't have a place. A variant of a stack called a circular buffer really speeds things up on a dsp. For general purpose use though
Sorry, but LISP (though I don't mean Common LISP) is just as much a stack language as FORTH. I think the first LISP that wasn't was LISP 1.5...but I'm rather vague on LISP history. Still, s-expressions are as stack oriented as FORTH is. The interesting thing is the first Algol 60 compiler (well...really an interpreter) I ever used was a stack machine. (That was why it was an interpreter. The real computer was an IBM 7090/7094 BCS system so it ran a stack machine program that Algol was compiled to run on. Whee!) So if you want a good stack computer language you could pick Algol 60. But FORTH is easier, and could even be the assembler language.
OTOH, most FORTHs I've seen use 3 or more stacks. I.e., most of them have a separate stack for floats. What would be *really* nice is if someone built a machine that used Neon as it's assembler. Neon is/was an Object-oriented dialect of FORTH for the Mac that allowed the user to specify early or late binding for variables. It was developed by Kyria Systems, a now-defunct software house. Unfortunately Neon died during a transition to MSWind95. I understand that it is survived by MOPS, but I've never had a machine that MOPS would run on, so I don't know how similar it was.
I think that FORTH would make a truly great assembler...and the more so if that dialect of FORTH were NEON. But I forget how many stacks it used. At least three, but I have a vague memory that it was actually four. The main stack, the return stack, the floating stack, and ??the Object stack??...I don't know.
I think we've pushed this "anyone can grow up to be president" thing too far.
The only modern example of a stack-based processor for accelerating Java that I am aware of, is the Java Optimized Processor (JOP).
Javascript + Nintendo DSi = DSiCade
Computer languages can be Turing complete, but physical computers cannot be.
SIGSEGV caught, terminating
wait... not that kind of sig.
Hmm...I'm playing around with implementing a simple Lisp...to get lexical closures I'm putting all frames in the heap instead of the stack, which I think is a fairly standard technique (tho not necessarily what a really optimized compiled Lisp does). Doesn't really seem stack-based to me.
And here it is.
You can't handle the truth.
If you miss Neon, you'll be happy to know that you can get about 90% of its implementation and 100% of its concepts in the form of PowerMops, which is open-source and runs great and natively on Leopard. I haven't used it for anything recently, but it's worked fine for hobbyist stuff I've done in the past. I strongly encourage you to check it out.
What a brainfuck.
"You're everywhere. You're omnivorous."
Between 20 - 40% of your compiled code is spent moving data in and out of the accumulator register, since most instructions depend on
specific data being in that register - to the point that they introduced zero-cycle add/mov functionality in the P4 line - basically, if your code performs an add and then movs ax immediately
out to memory (like the above code - and possibly the most common arithmetic operation in compiled code), if the pipeline and data caches are all available, the P4 will
execute both instructions with enough time to put something else in the instruction pipeline that
cycle. It's not really a zero-cycle function - you can do something like 2.5 (add,mov,add,mov,add) a cycle if you stack them back to back to back, for instance...
The only zero-cycle mov I'm familiar with on the P4 is a register-to-register mov, and that just takes advantage of the fact that the P4 has a physical register file and a map between the architectural registers and the physical ones. E.g. given
add bx, [cx]
mov ax, bx
the mapper might assign bx to physical register 10. It will then realize that ax is just a copy of bx, so it will make ax point at register 10 as well, and the mov never has to execute at all, thus 'zero cycle'.
You seem to be saying that the P4 can write the result of an add to the cache in zero cycles, or more than two values in a cycle, which doesn't mesh with what i know of the P4 which is that it has a two-ported cache. But I'm only intimately familiar with early revs of P4; if you know what rev this was added in I would be interested.
The enemies of Democracy are
Chuck Moore (the Forth guy) came to a slightly different conclusion: good for speed and good for low power. He uses the chip real estate you want to use pipelining instructions to add another core. In the case of the SeaForth processors, he added 23 other cores. Granted, that chip doesn't pretend to do anything but target embedded devices, but he demonstrates that stack machines can run quickly and use little power.