Next Generation Stack Computing

← Back to Stories (view on slashdot.org)

Next Generation Stack Computing

Posted by ryuzaki0 on Thursday August 10, 2006 @07:18AM from the cpu-of-the-fuuuuture dept.

mymanfryday writes "It seems that stack computers might be the next big thing. Expert Eric Laforest talks about stack computers and why they are better than register-based computers. Apparently NASA uses stack computers in some of their probes. He also claims that a kernel would only be a few kilobytes large! I wonder if Windows will be supported on a stack computer in the future?"

8 of 347 comments (clear)

Twelfth of Never by Tackhead · 2006-08-10 07:20 · Score: 5, Funny

> He also claims that a kernel would only be a few kilobytes large! I wonder if Windows will be supported on a stack computer in the future?"
In Redmond, 640 bytes isn't enough for anybody.
wikipedia link by whitelines · 2006-08-10 07:24 · Score: 5, Informative

I didn't know either:
http://en.wikipedia.org/wiki/Stack_machines

--
/* TBD */
They're great by TechyImmigrant · 2006-08-10 07:26 · Score: 5, Funny

Mathematicians like stack computers because its easier to formally prove the behaviour of algorithms using stacks.
Hardware engineers like stack computers because the hardware is interesting and easy to design
Investors hate them because they keep loosing money on them.

--
Evil people are out to get you.
PC Stacks by celardore · 2006-08-10 07:26 · Score: 5, Funny

I once had a job where I had to sort through stacks of computers. Overall the stacks were pretty useless, a bunch of burnt out 286s. Even if you put all your redundant computing power into a stack doesn't neccesarily make it better!
Does it run Windows?!? by Stealth+Dave · 2006-08-10 07:30 · Score: 5, Funny

I wonder if Windows will be supported on a stack computer in the future?

No, no, no, NO! This is SLASHDOT! The proper response is "Does it run Linux "?

--
Evil is as eval("does");
Stop Hurting My Eyes by Anonymous Coward · 2006-08-10 07:49 · Score: 5, Informative

Dear Slashdot Contributors,

Please stop describing undergrads doing independent studies as "Experts". Theres a reason that mainstream processors haven't picked up on "Stack Processors", and it has nothing to do with binary compatibility, the difficulty of writing a compiler for their instruction set, or general programming complexity. Stack Machines are really only good for In-Order processing. Wonder why NASA probes have Stack Processors? Because they don't freaking need to do out of order processing in order to get the performance they require, and they probably found stack processors to have a favorable power / performance ratio for their application. You will never see a full blown Windows running on a Stack processor, because Superscalar processors destroy their performance.

"My research project shows that some people wrote nifty papers in the 1970s, but everyone ignored them for an obvious reason I don't understand." -> Not an Expert
Stack - bad for speed, good for low power by Theovon · 2006-08-10 08:59 · Score: 5, Insightful

I'm a chip designer, and I am working on my Ph.D. in CS. The idea of stack machines is something I have researched a bit, and I have drawn some of my own conclusions.

The main advantage of stack machines is that all or most parameters for each instruction are implicit. Aside from stack shuffle/rotate instructions, the operands are always the top few on the stack. This makes instructions very small. The logic is also exceedingly simple (for fixed-stack designs). If you want a simple, low-power CPU, a stack machine is what you want.

Where I explored this issue, however, is in the realm of high-performance computing. The key advantage of a stack architecture is that smaller instructions take less time to fetch from memory. If your RISC instructions are 32 bits, but your stack machine instructions are 8 bits, then your instruction caches are effectively 4x larger, and your over-all cache miss penalty is greatly reduced.

The problem with stack machines is that they're damn near impossible to add instruction-level parallelism to. With a RISC machine, near-by instructions that deal with different registers (i.e. no dependencies) can be executed in parallel (whether that's multi-issue or just pipelining). With a stack machine, everything wants to read/write the top of the stack.

I came up with two things to deal with this problem, that are very much like the CISC-to-RISC translation done by modern x86 processors, so it's more of a stack ISA on a RISC architecture. One is that the stack is virtual. When you want to pop from the stack, what's happening in the front-end of the CPU is that you're just popping register numbers corresponding to a flat register file. When you want to push, you're allocating an assigned register number from the flat register file. Now, if you can get two instructions going that read different parts of the stack and write (naturally) to different locations, you can parallelize them. The second part is a healthy set of register shuffling instructions. Since you're doing all of this allocation up front, shuffling registers is as simple as renumbering things in your virtual stack. So a swap operation swaps two register numbers (rather than their contents), and a rotate operation renumbers a bunch of them, but the pending instructions being executed still dump their results in the same physical registers.

This all sounds great, but there are some problems with this:

(1) The shuffling instructions are separate instructions. With a RISC processor, you have more information all in one unit. Although you could try to fetch and execute multiple stack instructions at once, it's much more complicated to execute four stack instructions in parallel than to execute a single RISC instruction, even though they require the same amount of memory.
(2) You need a lot of shuffling instructions. Say your stack contains values A, B, C, and D, and you want to sum them. Without shuffling, you'd add A and B, yielding E, then add E and C, yielding F, then add F and D. Three add instructions. If your adder(s) is/are pipelined, you'd like to add A+B and C+D in parallel or overlapping, THEN wait around for their results and do the third add. The problem is that to do that, you'd need to add A+B, then rotate C to the top then D to the top, then add, then add again. The first case was 3 instructions; the second case is 5 instructions. Depending on your architecture, the extra shuffle instructions may take so long to process that you might as well just have waited. No speed gain at all.
(3) The extra shuffing instructions take up space. Optimizers are hard to write. Although it's conceivable that one could optimize for this architecture so as to avoid as many shuffling instructions as possible, you still end up taking up quite a lot of space with them, potentially offsetting much of the space savings that you got from switching from RISC to stack.

So, there you have it. Somewhat OT, because surely NASA's primary goal has got to be low-power, but also somewhat on-topic because stack architectures aren't the holy grail. Just ideal for some limited applications.
A Near Miss for Stack Computing Circa 1981 by Baldrson · 2006-08-10 11:12 · Score: 5, Interesting

Stack computing came close to changing the course of the computer industry, including setting networking forward 15 years (displacing Microsoft's stand-alone approach to software) back in 1981.
An excerpt from a bit longer essay I wrote:

In August 1980, Byte magazine published its issue on the Forth programming language
At that time, I was working with Control Data Corporation's PLATO project, pursuing a mass market version of that system using the Intelligent Student Terminal (IST). The IST's were Z80 processor terminals sporting 512*512 bit mapped displays with touch sensitive screens and 1200bps modems that went for about $1500. We were shooting for, and actually successfully tested, a system that could support almost 8,000 simultaneous users on 7600-derived Cybers (the last machine designed by Seymour Cray to be marketed by CDC --with 60 bits per word, 6 bits per character, no virtual memory, but very big and very fast) with under 1/4 second response time (all keys and touch inputs went straight to the central processor) for $40/month flat rate including terminal rental. Ray Ozzie had been working at the University of Illinois on offloading the PLATO central system to the Z80 terminal through downloaded assembly language programming, doing exotic things like "local key echo" and such functions.
I was interested in extending Ray's work to offload the mass-market version of the PLATO central system. In particular I was looking at a UCSD Pascal-based approach to download p-code versions of terminal functions -- and even more in particular the advanced scalable vector graphics commands of TUTOR (the "relative/rotatable" commands like rdraw, rat, rcircle, rcircleb, etc.) if not entire programs, to be executed offline. Pascal was an attractive choice for us at the time because CDC's new series of computers, the Cyber 180 (aka Cyber 800) was to have virtual memory, 64 bit words, 8 bit characters and be programmed in a version of the University of Minnesota Pascal called CYBIL (which stood for Cyber Implementation Language). Although this was a radically different architecture than that upon which PLATO was then running, I thought it worthwhile to investigate an architecture in which a reasonable language (you should have seen what we were used to!) could be made to operate on both the server and the terminal so that load could be dynamically redistributed. This idea of dynamic load balancing would, later, contribute to the genesis of Postscript.
Over one weekend a group of us junior programmers managed to implement a good portion of TUTOR's (PLATO's authoring language) advanced graphics commands in CYBIL. Our little hunting pack at CDC 's Arden Hills Operations was in a race against the impending visit of Dave Anderson of the University of Illinois' PLATO project who was promoting what he called "MicroTUTOR". Anderson was going to take the TUTOR programming language and implement a modified version of it for execution in the terminal -- possibly in a stand-alone mode. Many of us didn't like TUTOR, itself, much. Indeed, I had to pull teeth to get the authorization to put local variables into TUTOR -- and we were determined to select a better board from our quiver with which to surf Moore's Shockwave into the Network Revolution. CDC management wasn't convinced that such a radical departure from TUTOR would be wise, and we hoped to demonstrate that a p-code Pascal approach could accomplish what microTUTOR purported to -- and more. We quickly ported a TUTOR central sy

--
Seastead this.