Next Generation Stack Computing

Twelfth of Never by Tackhead · 2006-08-10 07:20 · Score: 5, Funny

> He also claims that a kernel would only be a few kilobytes large! I wonder if Windows will be supported on a stack computer in the future?"

In Redmond, 640 bytes isn't enough for anybody.

Re:Twelfth of Never by jellomizer · 2006-08-10 07:35 · Score: 2, Insightful

But in reality this would be a Major Redesign of the OS, and all Apps would need to be recompiled/emulated. Registers are a core part of assembably language. Having to remake Windows would be like making Windows for the Power PC, If not more difficult.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re:Twelfth of Never by real_b0fh · 2006-08-10 07:47 · Score: 3, Insightful

actually, if windows is 'done right' all it would take is a recompile. I don't think there is a lot of assembler code in the windows source that needs to be rewritten, most code will be in C or C++.

--
"Contrary to popular belief, UNIX is user friendly. It just happens to be selective on who it makes friendship with"
Re:Twelfth of Never by Eric+LaForest · 2006-08-11 06:55 · Score: 2, Interesting

Correct, except 2nd-gen stack machines have a dedicated stack to hold those return addresses, so they never get to memory. Makes for very fast calls and returns. Experiments by Prof. Koopman have shown that for all practical purposes, a return address stack of 16 elements is deep enough. There is such a 16-deep stack (hidden from the programmer) on the Pentium 4 (and the Alpha AXP too I think): http://blogs.msdn.com/oldnewthing/archive/2004/12/ 16/317157.aspx

--
none

Oh? by qbwiz · 2006-08-10 07:23 · Score: 3, Funny

I thought the 387 and Burroughs B5000 were odd, antiquated architectures, but apparently they're the wave of the future.

--
Ewige Blumenkraft.

Re:Oh? by Rob+T+Firefly · 2006-08-10 08:04 · Score: 2, Funny

I had a good C interpreter ported over to punch cards, but one day I accidentally dropped crate #147 off the forklift and they went everywhere. Damn my lazy habit of not labelling my media!

I should be finished unshuffling them in another six or seven months.

--
Slashdot Burying Stories About Slashdot Media Owned
Re:Oh? by The_Wilschon · 2006-08-10 08:53 · Score: 4, Informative

(and Turing complete)
Bzzzzt! No actual machine can ever be Turing complete, because theoretical Turing machines are capable of calculations which require an unbounded amount of space. That is, there exist algorithms which a Turing machine can execute which require more memory than any computer that you make.

Computer languages can be Turing complete, but physical computers cannot be.

--
SIGSEGV caught, terminating

wait... not that kind of sig.
Re:Oh? by novus+ordo · 2006-08-10 11:04 · Score: 2, Informative

What a brainfuck.

--
"You're everywhere. You're omnivorous."

wikipedia link by whitelines · 2006-08-10 07:24 · Score: 5, Informative

I didn't know either:
http://en.wikipedia.org/wiki/Stack_machines

--
/* TBD */

Re:wikipedia link by SatanicPuppy · 2006-08-10 08:41 · Score: 3, Informative

This source about stack computing is better.

Sadly I actually still work on a stack computer, and I had to go look it up.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.

Size and functionality by Angst+Badger · 2006-08-10 07:25 · Score: 4, Insightful

He also claims that a kernel would only be a few kilobytes large!

I've seen sub-1k kernels for FORTH systems before. The question is, how much functionality do you want to wrap into that kernel? More capable kernels would, of course, be correspondingly larger.

That said, stack computing and languages like FORTH have long been underrated. Depending on the application, the combination of stack computers and postfix languages can be quite powerful.

--
Proud member of the Weirdo-American community.

Re:Size and functionality by merlin_jim · 2006-08-10 07:55 · Score: 4, Insightful

Couldn't you program a stack computer just as well with a prefix functional language like Scheme?

Sure you can - and it compiles to postfix notation anyways, rather ineffeiciently I might add (get it, add????)

let's say you wanted to write a function like:
function addsubandmultiply(b, c, d, e) {
a = (b + c) * (d - e);
return a;
}

and you've got assembly level instructions such as mov, add, sub, mult, push, and pop, as well as
the very stack-centric stor and lod, allowing you to move one or more stack variables to memory and
the reverse.

A typical register based computer might compile the above as:

pop b
pop c
pop d
pop e
mov b, ax
mov c, bx
add bx
mov ax, temp_memory
mov d, ax
mov e, bx
sub bx
mov temp_memory, bx
mult bx
push a

Whereas a stack-based computer might compile as:

add
stor temp_memory
sub
lod temp_memory
mult

In a stack based computer, operations are carried out directly on your stack... it's very convenient,
since most languages compile function calls to use the stack anyways, and as you can see not having
to deal with an accumulator register makes for much terser code. Between 20 - 40% of your compiled code is spent moving data in and out of the accumulator register, since most instructions depend on
specific data being in that register - to the point that they introduced zero-cycle add/mov functionality in the P4 line - basically, if your code performs an add and then movs ax immediately
out to memory (like the above code - and possibly the most common arithmetic operation in compiled code), if the pipeline and data caches are all available, the P4 will
execute both instructions with enough time to put something else in the instruction pipeline that
cycle. It's not really a zero-cycle function - you can do something like 2.5 (add,mov,add,mov,add) a cycle if you stack them back to back to back, for instance...

Yes, Intel released a benchmark for it. No, I can't imagine why you would want to keep adding and moving the results around memory - maybe some esoteric functions like a fibbanoci generator or even a DSP algorithm of some sort might need to do it, but I don't think it'll be all that often... or that any compiler would have an optimisation to specifically output that sequence if appropriate...

--
I am disrespectful to dirt! Can you see that I am serious?!
Re:Size and functionality by Anonymous Coward · 2006-08-10 08:27 · Score: 4, Insightful

Actually x86 is inbetween a stack machine and a register based machine.
What most register machines compile the following code:
function addsubandmultiply(b, c, d, e) {
a = (b + c) * (d - e);
return a;
}
Into something like (sorry for PPC asm):
add r3, r3, r4
sub r4, r5, r6
mulw r3, r3, r4
blr #(return)

Now tell me that is not just as simple (or even simplier) as the stack based one?
Re:Size and functionality by Chris+Burke · 2006-08-10 11:11 · Score: 4, Informative

Between 20 - 40% of your compiled code is spent moving data in and out of the accumulator register, since most instructions depend on
specific data being in that register - to the point that they introduced zero-cycle add/mov functionality in the P4 line - basically, if your code performs an add and then movs ax immediately
out to memory (like the above code - and possibly the most common arithmetic operation in compiled code), if the pipeline and data caches are all available, the P4 will
execute both instructions with enough time to put something else in the instruction pipeline that
cycle. It's not really a zero-cycle function - you can do something like 2.5 (add,mov,add,mov,add) a cycle if you stack them back to back to back, for instance...

The only zero-cycle mov I'm familiar with on the P4 is a register-to-register mov, and that just takes advantage of the fact that the P4 has a physical register file and a map between the architectural registers and the physical ones. E.g. given
add bx, [cx]
mov ax, bx

the mapper might assign bx to physical register 10. It will then realize that ax is just a copy of bx, so it will make ax point at register 10 as well, and the mov never has to execute at all, thus 'zero cycle'.

You seem to be saying that the P4 can write the result of an add to the cache in zero cycles, or more than two values in a cycle, which doesn't mesh with what i know of the P4 which is that it has a two-ported cache. But I'm only intimately familiar with early revs of P4; if you know what rev this was added in I would be interested.

--

The enemies of Democracy are

Linking to 300MB video files from Slashdot? by Colin+Smith · 2006-08-10 07:25 · Score: 2, Funny

Someone's having a larf. Oh you do crack me up Messrs mymanfryday and CmdrTaco.

Please try the bittorrent. No, wait... Teach em a lesson, make em burn.

--
Deleted

.NET Compatibility by DaHat · 2006-08-10 07:26 · Score: 4, Interesting

Interestingly enough the Microsoft Intermediate Language (MSIL) that .NET apps are compiled to before being JITed into machine code is actually built around a stack based system as well... No doubt porting the .NET Framework over to such a system would be quite easy... and give much in the way of performance boosts (especially on startup).

Of course... that would still depend on a version of Windows for it to run on.

--
Help Brendan pay off his student loans

Re:.NET Compatibility by jfengel · 2006-08-10 07:31 · Score: 3, Informative

The Java Virtual Machine is also sort of stack-based. The JVM bytecode set uses stack operations but the safety checks that it runs make it equivalent to a sliding set of registers not unlike, say, the SPARC architecture. A JIT implementation could do away with the stack, at least in the individual method calls, though the call-return stack would still have to be there.

They're great by TechyImmigrant · 2006-08-10 07:26 · Score: 5, Funny

Mathematicians like stack computers because its easier to formally prove the behaviour of algorithms using stacks.
Hardware engineers like stack computers because the hardware is interesting and easy to design
Investors hate them because they keep loosing money on them.

--
Evil people are out to get you.

We are heard this before... by __aaclcg7560 · 2006-08-10 07:26 · Score: 3, Funny

Apparently NASA uses stack computers in some of their probes.

In space no one can hear you blue screen of death. Unless you work for Lucas Films.

PC Stacks by celardore · 2006-08-10 07:26 · Score: 5, Funny

I once had a job where I had to sort through stacks of computers. Overall the stacks were pretty useless, a bunch of burnt out 286s. Even if you put all your redundant computing power into a stack doesn't neccesarily make it better!

Awesome by LinuxFreakus · 2006-08-10 07:27 · Score: 4, Insightful

Does this mean my old HP48GX will be considered cutting edge? I should get ready to sell it on EBay when the craze hits! All my old classmates will be forced to allow me to have the last laugh after I was on the recieving end of much ridicule for using the HP when the TI was the only thing "officially" endorsed by all the calculus textbooks. I don't know if I could ever part with it though. I still use it almost daily, the thing continues to kick ass.

Forth? by dslmodem · 2006-08-10 07:28 · Score: 2, Informative

I remember that FORTH is a language support STACK COMPUTING. Hopefully, it is not totally wrong. Unfortunately, it is really hard to understand FORTH program.

http://en.wikipedia.org/wiki/Forth_programming_lan guage

--

^(oo)^pig~

Re:Forth? by CaptnMArk · 2006-08-10 07:55 · Score: 2, Informative

wouldn't that be more like:

face; mouth; teeth; brush; wash; wash;

Does it run Windows?!? by Stealth+Dave · 2006-08-10 07:30 · Score: 5, Funny

I wonder if Windows will be supported on a stack computer in the future?

No, no, no, NO! This is SLASHDOT! The proper response is "Does it run Linux "?

--
Evil is as eval("does");

Re:Does it run Windows?!? by doi · 2006-08-10 08:13 · Score: 2, Funny

I think you mean a Beowolf Cluster...
I think you mean a Beowolf STACK...
Which would be better, a cluster of Beowolf Stacks, or a stack of Beowolf Clusters? Of course, the answer is a stacked cluster of Beowolf Clustered Stacks.

--
A man's reach must exceed his grasp, or what's an erection for?
Re:Does it run Windows?!? by roman_mir · 2006-08-10 08:52 · Score: 4, Funny

No, the proper response here is this: it Linux run does?

--
You can't handle the truth.

X86 FPU's finally losing their stackness by GGardner · 2006-08-10 07:31 · Score: 4, Interesting

Since the dawn of time, the x86 FPU has been organized as a stack, which has been recognized as a mistake by modern computer architects. For one thing, it is hard to get a stack architecture to take advantage of multiple functional units. Only recently, with the development of SSE, 64 bit modes and other additions have we been able to move away from the stack on the x86.

Re:X86 FPU's finally losing their stackness by Anonymous Coward · 2006-08-10 07:50 · Score: 2, Interesting

we been able to move away from the stack on the x86
As someone who has written several Forth compilers for the x86 I'd like to point out that the design of the stacks on the x86 is very inefficient. The main stack is just that: a stack tied to one particular register. The FP stack was just a joke; a dead weasel could have designed something better. Anyway, I do like using Forth even under the x86 model - it's nice to know that my entire programming system fits into the on-die cache!
Re:X86 FPU's finally losing their stackness by Tumbleweed · 2006-08-10 08:05 · Score: 2, Funny

Since the dawn of time, the x86 FPU has been organized as a stack

No no no, since the dawn of time, Man has yearned to destroy the Sun!

x86 came much later, right after the COBOL and the other dinosaurs.

Fun and games by Carnildo · 2006-08-10 07:32 · Score: 4, Funny

It's all fun and games until someone hits a stack underflow.

--
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.

Text of PPT by Anonymous Coward · 2006-08-10 07:40 · Score: 4, Informative

Introduction
Discovered field by chance in 2000 (blame the Internet)
Hobby project (simulations and assembly) until 2004
Transformed into Independent Study thesis project
Overview of current state of research
Focus on programmer's view

Stack Computers: Origins
First conceived in 1957 by Charles Hamblin at the University of New South Wales, Sydney.
Derived from Jan Lukasiewicz's Polish Notation.
Implemented as the GEORGE (General Order Generator) autocode system for the DEUCE computer.
First hardware implementation of LIFO stack in 1963: English Electric Company's KDF9 computer.
Stack Computers: Origins (Part 2)
Independently discovered in 1958 by Robert S. Barton (US).
Implemented in the Burroughs B5000 (also in 1963).
Better known
Spawned a whole family of stack computers
The First Generation
The First Generation: Features
Multiple independent stacks in main memory
Stacks are randomly accessible data structures
Contained procedure activation records
Evaluated expressions in Reverse Polish Notation
Complex instructions sets trying to directly implement high-level languages (e.g.: PL/1, FORTRAN, ALGOL)
Few hardware buffers (four or less typically)
Supplanted in the 1980's by RISC and better compilers
Stack Computers: A New Hope
Enter Charles H. ("Chuck") Moore:
Creator of the stack-based FORTH language, circa 1970
Left Forth, Inc. in 1981 to pursue hardware implementations
NOVIX (1986), Sh-BOOM (1991), MuP21 (1994), F21 (1998), X18 (2001)
Currently CTO of Intelasys, still working on hardware
product launch expected April 3, 2006 at Microprocessor Summit
Enter Prof. Philip Koopman, Carnegie-Mellon University
Documented salient stack designs in "Stack Computers: The New Wave", 1989
The Second Generation
The Second Generation: Features
Two or more stacks separate from main memory
Stacks are not addressable data structures
Expression evaluation and return addresses kept separate
Simple instruction sets tailored for stack operations
Still around, but low-profile (RTX-2010 in NASA probes)
Strangely, missed by virtually all mainstream literature
Exception: Feldman & Retter's "Computer Architecture", 1993
Arguments and Defense
Taken from Hennessy & Patterson's "Computer Architecture: A Quantitative Approach", 2nd edition
Summary: Valid for First Generation, but not Second
Argument: Variables
More importantly, registers can be used to hold variables. When variables are allocated to registers, the memory traffic reduces, the program speeds up (since registers are faster than memory), and the code density improves (since a register can be named with fewer bits than a memory location).
[H&P, 2nd ed, pg 71]
Manipulating the stack creates no memory traffic
Stacks can be faster than registers since no addressing is required
Lack of register addressing improves code density even more (no operands)
Globals and constants are kept in main memory, or cached on stack for short sequences of related computations
Ultimately no different than a register machine
Argument: Expression Evaluation
Second, registers are easier for a compiler to use and can be used more effectively than other forms of internal storage. For example, on a register machine the expression (A*B)-(C*D)-(E*F) may be evaluated by doing the multiplications in any order, which may be more efficient due to the location of the operands or because of pipelining concerns (see Chapter 3). But on a stack machine the expression must be evaluated left to right, unless special operations or swaps of stack position are done.
[H&P, 2nd ed, pg. 71]
Less pipelining is required to keep a stack machine busy
Location of operands is always the stack: no WAR, WAW dependencies
However: always a RAW dependency between instructions
Infix can be easily compiled to postfix
Dijkstra's "shunting yard" algorithm
Stack swap operations equivalent to register-register move operations
S

JVM by TopSpin · 2006-08-10 07:40 · Score: 4, Informative

Java bytecode is interpreted on a virtual stack based processor. Most bytecode gets JITed into native register based instructions, but the model JVM processor is a stack processor.

Some previous poster noted that CLI is also a stack based model. I can't verify that myself but it wouldn't surprise me; Microsoft is, after all, highly 'innovative' or something.

--
Lurking at the bottom of the gravity well, getting old

Re:JVM by Anonymous Coward · 2006-08-10 07:55 · Score: 3, Informative

Its not like Java was super-innovative to use the stack-based architecture. Java was designed with web-applications in mind, and as such having small code size was extremely important for bandwidth reasons. One of the best features of stack machines is the small instruction size (no need to store register locations). So a stack machine is a natural choice for the JVM. If you wanna nag on .NET copying Java, there are plenty of good reasons, but this isn't one.

There is one very widely used FORTH-type language by porkchop_d_clown · 2006-08-10 07:46 · Score: 2, Insightful

that almost every /. user encounters every day: Postscript and PDF.

--
Clear, Dark Skies

Appropriate instruction set by dpilot · 2006-08-10 07:46 · Score: 4, Insightful

Even in assembler, the mainstream hasn't been programming to the metal since Pentium I.

Beginning with Pentium II, and propagating to pretty much all of the other archictures in a short time, non of the mainstream CPUs have exposed their metal. We have an instruction set, but it's torn into primitives and scheduled for execution. We don't see the primitives, not even in assembler. AFAIK, there isn't even a way to use the true primitives, except perhaps on the Transmeta, where it was undocumented.

So in this light, since we're already fairly far from the true metal, it seems to me that it makes a lot of sense to re-evaluate the instruction set itself. Of course one could raise the Itanium argument, but I would also argue that politics were too big a part, there. Then again, one could also argue that x86 and amd64 are just so entrenched that it doesn't matter, and they do run well on today's hardware.

Then again I could cite my old favorite, the 6809. It started from the same origins and precepts as RISC, but a different attitude. RISC simply tried to optimize the most common operations, at the expense of less common ones. With the 6809, they tried to understand WHY certain things were happening, and how those things could be done better and faster. They ended up with a few more transistors, the same speed, and something approaching 3X the throughput, as compared to the 6800. More similar to the current topic, there was a paper on 'contour mapping', mapping blocks of cache into stacks and data structures. The 6809 was too old for a cache, but it seems to me that combining it's concepts with the contour mapping would be interesting indeed.

But like stack engines, it's not x86/amd64 compatible.

--
The living have better things to do than to continue hating the dead.

Re:Appropriate instruction set by stevesliva · 2006-08-10 08:47 · Score: 2, Insightful

He's saying that the latest Intel chips run micro-ops that do not have a 1-to-1 correspondence with the x86 ISA to which you refer. Git it?

--
Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
Re:Appropriate instruction set by vtcodger · 2006-08-10 09:23 · Score: 2, Interesting

***Then again I could cite my old favorite, the 6809.***
The 6809 was not only easy and fun to program, 6809 programs tended to benchmark out significantly faster than programs for comperable CPUs like the Z80, 6800 and 8080. If the industry ever decides to scrap the x86 mess -- which they won't -- going back to the 6809 for a starting point might not be a bad idea at all. I once did a plot of measured times for a benchmark where timings were available for a bunch of CPUs (Sieve of Eratosthenes). When you plotted out clockspeed vs word width, all the CPUs from the 8080 to the Cray something or other fell out into an untidy straight line, except for the 6809. There were, as I recall, three different results published for SOE on the 6809 and all three were an order of magnitude faster than they had any reasonable expectation of being based on the hardware's apparent capabilities.

--
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey

Why these downright stupid comments? by Jerk+City+Troll · 2006-08-10 07:47 · Score: 3, Insightful

You “wonder if Windows will run on a stack computer?” Where do you people come up with this nonsense? This is as irrelevant as saying: "someday, car tires will not be made of rubber. I wonder if Windows will support them?" Really, there is no need to try to come up with insightful remarks or questions to tack on the end of your story submissions. Just present the article and leave it at that. Let everyone else do the thinking.

--
Join Tor today!

Stop Hurting My Eyes by Anonymous Coward · 2006-08-10 07:49 · Score: 5, Informative

Dear Slashdot Contributors,

Please stop describing undergrads doing independent studies as "Experts". Theres a reason that mainstream processors haven't picked up on "Stack Processors", and it has nothing to do with binary compatibility, the difficulty of writing a compiler for their instruction set, or general programming complexity. Stack Machines are really only good for In-Order processing. Wonder why NASA probes have Stack Processors? Because they don't freaking need to do out of order processing in order to get the performance they require, and they probably found stack processors to have a favorable power / performance ratio for their application. You will never see a full blown Windows running on a Stack processor, because Superscalar processors destroy their performance.

"My research project shows that some people wrote nifty papers in the 1970s, but everyone ignored them for an obvious reason I don't understand." -> Not an Expert

Re:Stop Hurting My Eyes by HiThere · 2006-08-10 08:23 · Score: 2, Insightful

I believe that your criticisms apply to only specific stack based architectures. That they do apply to the commonly presumed architectures I accept, but this is far from asserting them as general truths.

Actually, even asserting that register based computers solve the problems that you are describing is not a general truth. You need to specify how many registers of what type can deal with how many out of order processes. And I suspect that a stack computer with 5 or 6 stacks could deal as flexibly with that problem as is commonly required...with a bit of extra leeway. It would need to be able to implement rapid task switching based on a "high priority task stack"...and maintaining that would be a bit of a nuisance...but that particular stack could have a very short limit, say 50 items. I'll agree that this is one function that it would be better to handle in scratchpad memory, but it would be eminently possible to do it purely from a stack based approach. (Still, there's a good reason that priority queues are queues rather than stacks...and I would argue that a dequeue would be an even better approach.

Well, I'm neither a hardware engineer nor a computer system designer, so I could be wrong. OTOH, you're anonymous, which means that your arguments only deserve the weight that their own internal logic provides.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:Stop Hurting My Eyes by Eric+LaForest · 2006-08-10 11:57 · Score: 2, Interesting

"Out of order processing has been done in stack machines for years now."

As far as I know, there are no implemented second-gen stack computers that support that feature.
(There have been a few theoretical ones.)
Which ones are you talking about?

--
none

Question about stack computer types by thewiz · 2006-08-10 07:51 · Score: 3, Funny

Do these come in short- and tall-stack versions?
Are maple syrup and butter options?

--
If "disco" means "I learn" in Latin, does "discothèque" mean "I learn technology"?

Re:Assembly Code was fun by hal2814 · 2006-08-10 07:52 · Score: 4, Funny

RISC assembly code? That's so weak. I'd rather spend a day writing an assebmly routine that has an equivalent single obscure machine instruction I didn't know about beforehand, thank you very much.

Stack machines - again? by Animats · 2006-08-10 07:54 · Score: 3, Insightful

Who can forget the English Electric Leo-Marconi KDF9, the British stack machine from 1960. That, and the Burroughs 5000, were where it all began.

Stack machines are simple and straightforward to build, but are hard to accelerate or optimize. Classically, there's a bottleneck at the top of the stack; everything has to go through there. With register machines, low-level concurrency is easier. There's been very little work on superscalar stack machines. This student paper from Berkeley is one of the few efforts.

It's nice that you can build a Forth machine with about 4000 gates, but who cares today? It would have made more sense in the vacuum tube era.

Not a good idea by coats · 2006-08-10 07:55 · Score: 4, Insightful

The reason modern systems are so fast is that they hide a lot of fine grained parallelism behind the scenes. It is very hard to express this kind of parallelism in a way that it can be executed on a stack machine.

How important is this parallism? Consider that modern processors have 10-30 pipeline stages, 3-6 execution units that can have an instruction executing at each stage; moreover, most of them have out-of-order execution units that handle instructions more in the order that data is available for them rather than the order they are listed in the object file (and main memory is hundreds of times slower than the processors themselves, so this is important!). Typically, such processors can have more than 100 instructions in some stage of execution (more than 250 for IBM POWER5 :-)

Consider, also, that the only pieces of anything-like-current stack hardware are Intel x87-style floating point units, that Intel is throwing away -- for good reason! -- in favor of (SSE) vector style units. In the current Intel processors, the vector unit emulates an x87 if it needs to -- but giving only a quarter of the performance.

Someone made remarks about Java and .Net interpreters: in both cases, the interpreter is simulating a purely scalar machine with no fine grained parallelism; no wonder an extensible software-stack implementation is one of the simplest to implement. Stacks are not the way that true Java compilers like gjc generate code, though!

No, stack-based hardware is not a good idea. And haven't been since some time in the eighties, when processors started to be pipelined, and processor speed started outstripping memory speed.

--
"My opinions are my own, and I've got *lots* of them!"

Re:Not a good idea by coats · 2006-08-13 00:58 · Score: 2, Interesting

Suppose you put *two* dozen of them on a chip, and suppose they are *four* times faster. You still have less than a quarter the performance of a Conroe or POWER5 (both of which are dual-core, with each core sustaining more than 200 instructions in flight at a time), and you still have to manage that parallelism "by hand". Actually, the "four times faster" won't work, either -- remember that memory is still 200 times slower than Conroe or POWER5; if memory were 800 times slower than your processor, you'd really lose your performance!
This has been discussed ad nauseam in the computer architecture community, and I repeat: it's not a good idea!

--
"My opinions are my own, and I've got *lots* of them!"

NASA by HTH+NE1 · 2006-08-10 07:55 · Score: 4, Insightful

Apparently NASA uses stack computers in some of their probes.

Is that supposed to be a ringing endorsement? I thought NASA was using components the rest of the world treated as obsolete due their proven durability and reliability in the radiation of space.

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?

Re:NASA by Medievalist · 2006-08-10 08:18 · Score: 2, Interesting

I thought NASA was using components the rest of the world treated as obsolete due their proven durability and reliability in the radiation of space.
Essentially correct. It is so costly and time-consuming to get a new component certified for use that it's usually less work to find a clever way to use old components. Then ten months after launch production ceases on the old part, and you have to have special ones built at a hundred times the cost (military option) or scavenge them on eBay (scientific option).

FORTH post! by Anonymous Coward · 2006-08-10 07:58 · Score: 2, Funny

Nothing to see here. Sorry.

No, they're not better by vlad_petric · 2006-08-10 08:01 · Score: 3, Interesting

I didn't even try to torrent-download the thing, but I can tell you why stack machines aren't better than register-based ones. The main reason is that it's much much much harder to do renaming of a stack than of"regular" registers. Without renaming you can't do out-of-order execution ... Currently, there are two "mainstream" architectures that include stacked register files: Itanium and Sparc. Neither of them have out-of-order implementations.

But why do you need out-of-order execution? Well, misses to memory are very expensive these days - it can easily take from 200 to 400 cycles to service a load that misses all the way to main memory. This can have a significant effect on performance. What out-of-order execution does is to allow independent instructions that are younger than the load to execute in parallel with it. Quite often these parallely-executed instruction will generate other misses to main memory, overlapping their latencies. So - latency of loads that miss is still very high, but at the very least the processor is not idle while servicing them (for a good read see "MLP Yes! ILP no!" by Andy Glew)

Itanium and Sparc compensate for the fact that they don't do stuff out-of-order by putting sh*tloads of L2/3 cache on-chip. The cost of a miss is still very high, but it happens much less often. The manufacturing cost of a chip is also much higher.

Note that what NASA is sending into space is "old" tech. The reason - well, cosmic rays are much stronger in outer space, and the smaller the gate, the easier it is for them to flip its state.

P.S. I'm a computer architect.

--

The Raven

Re:For the same reason language choice always matt by HiThere · 2006-08-10 08:04 · Score: 4, Informative

Sorry, but LISP (though I don't mean Common LISP) is just as much a stack language as FORTH. I think the first LISP that wasn't was LISP 1.5...but I'm rather vague on LISP history. Still, s-expressions are as stack oriented as FORTH is. The interesting thing is the first Algol 60 compiler (well...really an interpreter) I ever used was a stack machine. (That was why it was an interpreter. The real computer was an IBM 7090/7094 BCS system so it ran a stack machine program that Algol was compiled to run on. Whee!) So if you want a good stack computer language you could pick Algol 60. But FORTH is easier, and could even be the assembler language.

OTOH, most FORTHs I've seen use 3 or more stacks. I.e., most of them have a separate stack for floats. What would be *really* nice is if someone built a machine that used Neon as it's assembler. Neon is/was an Object-oriented dialect of FORTH for the Mac that allowed the user to specify early or late binding for variables. It was developed by Kyria Systems, a now-defunct software house. Unfortunately Neon died during a transition to MSWind95. I understand that it is survived by MOPS, but I've never had a machine that MOPS would run on, so I don't know how similar it was.

I think that FORTH would make a truly great assembler...and the more so if that dialect of FORTH were NEON. But I forget how many stacks it used. At least three, but I have a vague memory that it was actually four. The main stack, the return stack, the floating stack, and ??the Object stack??...I don't know.

--

I think we've pushed this "anyone can grow up to be president" thing too far.

Stack computers are hardly new by Junks+Jerzey · 2006-08-10 08:09 · Score: 4, Insightful

Normally this kind of stuff doesn't bug me, but this is like an article in 2006 proclaiming the benefits of object-oriented programming. Doesn't anyone know their computing history?

There were stack computers in the 1960s and 1970s. There was a resurgence of interest in the 1980s--primarily because of Forth's popularity in embedded systems--resulting in a slew of stack-based "Forth engines." Forth creator Chuck Moore has been working on a series of custom Forth CPUs for 20+ years now. His latest version has 24 cores on one chip (and was entirely designed by one person and uses MILLIWATTS of power).

Stack processors and languages have one big advantage: they minimize the overall complexity of a system. The tradeoff is that they often push some of that complexity onto the programmer. That's why Forth tends to shine for small, controlled systems (like a fuel injector or telescope controller or fire alarm), but you don't see people writing 3D video games or web browsers in Forth.

Re: transputer wikipedia link by Kevster · 2006-08-10 08:09 · Score: 2, Interesting

I'm surprised no one's mentioned the transputer.

--
I always equivocate. Well, almost always.

Re:Computer-Science Motto: Back to the Future by Andrew+Kismet · 2006-08-10 08:10 · Score: 2, Funny

I cannot consider your post valid, as you've claimed that 2000's "Bewitched" was 'art'...

Re:Cup of Joe by AKAImBatman · 2006-08-10 08:32 · Score: 2, Informative

Some modern embedded processors have been specifically designed to execute Java naively. e.g. ARM Jazelle and the new Atmel AVR32.

Yes, I'm aware of these processors. However, they're not actually stack-based. They convert the Java instructions into ARM RISC instructions which are register-based. So while such chips are very useful in accelerating Java on standard RISC architectures (also VLIW architectures such as MAJC), they are not actually stack machines.

The only modern example of a stack-based processor for accelerating Java that I am aware of, is the Java Optimized Processor (JOP).

--
Javascript + Nintendo DSi = DSiCade

Re:Who likes them? by Anonymous Coward · 2006-08-10 08:39 · Score: 2, Interesting

You're Just Wrong(tm) about that, actually. See BOOST. No, not the masochistic c++ template library (ANYTHING written in C++ is masochistic), Berkeley's Out of Order Stack Thingy. http://www.cs.berkeley.edu/~satrajit/cs252/BOOST.p df

Probably mostly just an accident of history register machines went superscalar first and "won" (mostly, because maybe since stack machines were more efficient, the need for superscalarity didn't hit so early...),. But, in short: stack machines, with similar design overheads to register machines, can extract at least as much concurrency as register machines, maybe more.

Re:I Know... by x2A · 2006-08-10 08:54 · Score: 3, Funny

Stack computers, are basically like rack computers, except you can't pull out the one at the bottom.

--
The revolution will not be televised... but it will have a page on Wikipedia

Stack - bad for speed, good for low power by Theovon · 2006-08-10 08:59 · Score: 5, Insightful

I'm a chip designer, and I am working on my Ph.D. in CS. The idea of stack machines is something I have researched a bit, and I have drawn some of my own conclusions.

The main advantage of stack machines is that all or most parameters for each instruction are implicit. Aside from stack shuffle/rotate instructions, the operands are always the top few on the stack. This makes instructions very small. The logic is also exceedingly simple (for fixed-stack designs). If you want a simple, low-power CPU, a stack machine is what you want.

Where I explored this issue, however, is in the realm of high-performance computing. The key advantage of a stack architecture is that smaller instructions take less time to fetch from memory. If your RISC instructions are 32 bits, but your stack machine instructions are 8 bits, then your instruction caches are effectively 4x larger, and your over-all cache miss penalty is greatly reduced.

The problem with stack machines is that they're damn near impossible to add instruction-level parallelism to. With a RISC machine, near-by instructions that deal with different registers (i.e. no dependencies) can be executed in parallel (whether that's multi-issue or just pipelining). With a stack machine, everything wants to read/write the top of the stack.

I came up with two things to deal with this problem, that are very much like the CISC-to-RISC translation done by modern x86 processors, so it's more of a stack ISA on a RISC architecture. One is that the stack is virtual. When you want to pop from the stack, what's happening in the front-end of the CPU is that you're just popping register numbers corresponding to a flat register file. When you want to push, you're allocating an assigned register number from the flat register file. Now, if you can get two instructions going that read different parts of the stack and write (naturally) to different locations, you can parallelize them. The second part is a healthy set of register shuffling instructions. Since you're doing all of this allocation up front, shuffling registers is as simple as renumbering things in your virtual stack. So a swap operation swaps two register numbers (rather than their contents), and a rotate operation renumbers a bunch of them, but the pending instructions being executed still dump their results in the same physical registers.

This all sounds great, but there are some problems with this:

(1) The shuffling instructions are separate instructions. With a RISC processor, you have more information all in one unit. Although you could try to fetch and execute multiple stack instructions at once, it's much more complicated to execute four stack instructions in parallel than to execute a single RISC instruction, even though they require the same amount of memory.
(2) You need a lot of shuffling instructions. Say your stack contains values A, B, C, and D, and you want to sum them. Without shuffling, you'd add A and B, yielding E, then add E and C, yielding F, then add F and D. Three add instructions. If your adder(s) is/are pipelined, you'd like to add A+B and C+D in parallel or overlapping, THEN wait around for their results and do the third add. The problem is that to do that, you'd need to add A+B, then rotate C to the top then D to the top, then add, then add again. The first case was 3 instructions; the second case is 5 instructions. Depending on your architecture, the extra shuffle instructions may take so long to process that you might as well just have waited. No speed gain at all.
(3) The extra shuffing instructions take up space. Optimizers are hard to write. Although it's conceivable that one could optimize for this architecture so as to avoid as many shuffling instructions as possible, you still end up taking up quite a lot of space with them, potentially offsetting much of the space savings that you got from switching from RISC to stack.

So, there you have it. Somewhat OT, because surely NASA's primary goal has got to be low-power, but also somewhat on-topic because stack architectures aren't the holy grail. Just ideal for some limited applications.

Re:Stack - bad for speed, good for low power by 0xABADC0DA · 2006-08-10 10:14 · Score: 2, Interesting

Uh, I guess I'm too daft to get a Ph. D, but it sure seems to me like optimizing on the instruction level with a stack machine is solving the wrong problem.

With a stack machine, running one instruction stream in parallel is very hard, while very easy on a register-based one. But the flip side of this is that on a stack machine running multiple instruction streams in parallel is incredibly easy while *Very* difficult on a register based CPU.

For instance, take "add 1 to each element of this 30-length array" and the optimization to unroll the loop by three:

The stack version can use parallel streams:

push array # "stack[2]"
push 30 # "stack[1]"
push 1 # "stack[0]" of stream #1
push 2 # "stack[0]" of stream #2
push 3 # "stack[0]" of stream #3
push 3 # number of parallel streams to run
fork
loop:
add 1 to mem at (stack[2] + stack[0])
stack[0] += 3
if stack[0] < stack[1] goto loop
join

You'll have to use your imagination to expand the loop body into what it would look like in stack-instructions, but basically the fork pops the number of parallel stacks to run and then the join waits for each parallel stack to complete. Of course in a real implementation you would also push a number of stack elements to copy, etc. Since instruction decoders for stack machines are so simple your cpu can have literally hundreds of them on a die and each one still doing useful work.

The register-based machine will unroll the loop:

set r1 to 30
set r2 to 0
set r3 to array
loop:
set r4 to r3[r2]
set r5 to r3[r2+1]
set r6 to r3[r2+2]
add 1, r4 store in r4
add 1, r5 store in r5
add 1, r6 store in r6
store r4 to r3[r2]
store r5 to r3[r2+1]
store r6 to r3[r2+2]
add 3 to r2
compare r2 to r1, jump to loop

Now try to run that in parallel and you get a couple memory fetches/write overlayed, but mostly it is pretty slow. Just one hiccup in the pipeline and all of the parallelism stops. Now to mention the code to catch the remainder of the loop if not an even multiple.
Re:Stack - bad for speed, good for low power by treyb · 2006-08-10 11:21 · Score: 2, Informative

Chuck Moore (the Forth guy) came to a slightly different conclusion: good for speed and good for low power. He uses the chip real estate you want to use pipelining instructions to add another core. In the case of the SeaForth processors, he added 23 other cores. Granted, that chip doesn't pretend to do anything but target embedded devices, but he demonstrates that stack machines can run quickly and use little power.

MMIX uses a register stack by bunratty · 2006-08-10 09:06 · Score: 2, Interesting

Knuth's MMIX architecture uses registers, but the registers themselves are on a register stack. Perhaps this architecture provides the best of both worlds.

--
What a fool believes, he sees, no wise man has the power to reason away.

A bumper sticker I saw once by Michael+Woodhams · 2006-08-10 09:17 · Score: 4, Funny

You Forth (heart) if honk then

--
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.

A useful HTML article (better than 9999TB AVI) by roman_mir · 2006-08-10 09:51 · Score: 4, Informative

And here it is.

--
You can't handle the truth.

No, the *PROPER* response is by Slithe · 2006-08-10 10:12 · Score: 2, Funny

Imagine a Beowulf cluster of them.

--
---- "XML is like violence. If it doesn't fix the problem, you aren't using enough."

Re:Obligatory by Hyram+Graff · 2006-08-10 10:20 · Score: 2, Funny

First, rotate your version 90 degrees counter-clockwise. Next exchange all '0's and '*'s. What do you have? The answer is down there in my sig.

I was going to use '*'s and '.'s but with variable width fonts I couldn't get it to come out in a grid and I couldn't figure out how to have a monospace font appear in my sig. Thus, I replaced the '.'s with '0's and have the version that you see.

--
0*0
00*
***

Re:For the same reason language choice always matt by BlueGecko · 2006-08-10 10:27 · Score: 2, Informative

If you miss Neon, you'll be happy to know that you can get about 90% of its implementation and 100% of its concepts in the form of PowerMops, which is open-source and runs great and natively on Leopard. I haven't used it for anything recently, but it's worked fine for hobbyist stuff I've done in the past. I strongly encourage you to check it out.

Joy! by selfdiscipline · 2006-08-10 11:09 · Score: 2, Interesting

How come noone has mentioned the language Joy?
I've looked into it a couple times, and it seems pretty neat. In a word, functional concatenation.
Plus, as we all know, functional languages are so much more fun than procedural.

--

-------
Incite and flee.

A Near Miss for Stack Computing Circa 1981 by Baldrson · 2006-08-10 11:12 · Score: 5, Interesting

Stack computing came close to changing the course of the computer industry, including setting networking forward 15 years (displacing Microsoft's stand-alone approach to software) back in 1981.

An excerpt from a bit longer essay I wrote:

In August 1980, Byte magazine published its issue on the Forth programming language
At that time, I was working with Control Data Corporation's PLATO project, pursuing a mass market version of that system using the Intelligent Student Terminal (IST). The IST's were Z80 processor terminals sporting 512*512 bit mapped displays with touch sensitive screens and 1200bps modems that went for about $1500. We were shooting for, and actually successfully tested, a system that could support almost 8,000 simultaneous users on 7600-derived Cybers (the last machine designed by Seymour Cray to be marketed by CDC --with 60 bits per word, 6 bits per character, no virtual memory, but very big and very fast) with under 1/4 second response time (all keys and touch inputs went straight to the central processor) for $40/month flat rate including terminal rental. Ray Ozzie had been working at the University of Illinois on offloading the PLATO central system to the Z80 terminal through downloaded assembly language programming, doing exotic things like "local key echo" and such functions.
I was interested in extending Ray's work to offload the mass-market version of the PLATO central system. In particular I was looking at a UCSD Pascal-based approach to download p-code versions of terminal functions -- and even more in particular the advanced scalable vector graphics commands of TUTOR (the "relative/rotatable" commands like rdraw, rat, rcircle, rcircleb, etc.) if not entire programs, to be executed offline. Pascal was an attractive choice for us at the time because CDC's new series of computers, the Cyber 180 (aka Cyber 800) was to have virtual memory, 64 bit words, 8 bit characters and be programmed in a version of the University of Minnesota Pascal called CYBIL (which stood for Cyber Implementation Language). Although this was a radically different architecture than that upon which PLATO was then running, I thought it worthwhile to investigate an architecture in which a reasonable language (you should have seen what we were used to!) could be made to operate on both the server and the terminal so that load could be dynamically redistributed. This idea of dynamic load balancing would, later, contribute to the genesis of Postscript.
Over one weekend a group of us junior programmers managed to implement a good portion of TUTOR's (PLATO's authoring language) advanced graphics commands in CYBIL. Our little hunting pack at CDC 's Arden Hills Operations was in a race against the impending visit of Dave Anderson of the University of Illinois' PLATO project who was promoting what he called "MicroTUTOR". Anderson was going to take the TUTOR programming language and implement a modified version of it for execution in the terminal -- possibly in a stand-alone mode. Many of us didn't like TUTOR, itself, much. Indeed, I had to pull teeth to get the authorization to put local variables into TUTOR -- and we were determined to select a better board from our quiver with which to surf Moore's Shockwave into the Network Revolution. CDC management wasn't convinced that such a radical departure from TUTOR would be wise, and we hoped to demonstrate that a p-code Pascal approach could accomplish what microTUTOR purported to -- and more. We quickly ported a TUTOR central sy

--
Seastead this.

Re:Assembly Code was fun by Thuktun · 2006-08-10 12:05 · Score: 3, Funny

I'd rather spend a day writing an assebmly routine that has an equivalent single obscure machine instruction I didn't know about beforehand, thank you very much.

http://www.netfunny.com/rhf/jokes/97/Nov/assembly. html

The X86 is an example of everything! by Cassini2 · 2006-08-10 12:28 · Score: 3, Interesting

I did a computer architecture course a number of years ago. One day, we came to the consensus that the X86 architecture was an example of every computer architecture in existence. You want load store: look at all those MOV AX, xxxx instructions. You want register RISC, look at all those registers AX, BX, CX, DX, SI, DI, SP, BP. You want stack based: look at the FPU. You want vector parallel processing, look at those MMX/SSE instructions. You want symmetric multi-processing, look at those dual cores.

The course went quickly downhill after this observation. No one could figure out how incorporating every processor architecture into one product was a good thing ...

If Stack-Based Computing Is So Great... by Nom+du+Keyboard · 2006-08-10 16:13 · Score: 2, Interesting

If stack-based computing is so great, powerful, and cheap, why aren't IBM PPC, AMD Athlon, Intel Core pick-a-number, and Sun Sparc dueling it out for the best stack-based chip. Why aren't the next-gen game consoles all using it, since Microsoft and Sony at least (Wii is just a faster GC) went to new architectures. Don't tell me no one has ever heard of the concept before. The Burroughs 5500 dates back to the late 1960's. I think there's more here than is being told.

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."

Parallalism by Peaker · 2006-08-11 00:15 · Score: 2, Interesting

How about achieving parallelism by using multiple stacks?

If a stack machine is that much simpler, couldn't you either have:

A vast amount of cores for many unrelated threads
Or: Multiple pipelines and explicit division of instructions into the pipelines?

The second refers to an instruction coding similar to VLIW such that you parallelise the code on multiple stacks but it still shares an instruction/data cache and allows for parallelism without heavy multi-threading at the high-level (and instead having parallelism as a compiler optimization at the low-level).

Slashdot Mirror

Next Generation Stack Computing

72 of 347 comments (clear)