Building a 32-Bit, One-Instruction Computer
Hugh Pickens writes "The advantages of RISC are well known — simplifying the CPU core by reducing the complexity of the instruction set allows faster speeds, more registers, and pipelining to provide the appearance of single-cycle execution. Al Williams writes in Dr Dobbs about taking RISC to its logical conclusion by designing a functional computer called One-Der with only a single simple instruction — a 32-bit Transfer Triggered Architecture (TTA) CPU that operates at roughly 10 MIPS. 'When I tell this story in person, people are usually squirming with the inevitable question: What's the one instruction?' writes Williams. 'It turns out there's several ways to construct a single instruction CPU, but the method I had stumbled on does everything via a move instruction (hence the name, "Transfer Triggered Architecture").' The CPU is implemented on a Field Programmable Gate Array (FPGA) device and the prototype works on a 'Spartan 3 Starter Board' with an XS3C1000 device available from Digilent that has the equivalent of about 1,000,000 logic gates, costing between $100 and $200. 'Applications that can benefit from custom instruction in hardware — things like digital signal processing, for example — are ideal for One-Der since you can implement parts of your algorithm in hardware and then easily integrate those parts with the CPU.'"
0x2A
That is the ultimate instruction.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
But can it run Vista?
It seems specious to say that One-Der is optimal for a task because it offers the flexibility of the underlying FPGA hardware. If you have the FPGA hardware present to run the One-Der implementation, then you could just configure a more optimally designed processor out of it for whatever task you are actually performing.
I am a geek attorney, but not your geek attorney unless you've already retained me. This is not legal advice.
vaguely reminds me of the nihilist language joke. A language that realizes that ultimately all things are futile and irrelevant, thus allowing all instructions to be reduced to a no-op.
Everyone attack him before he wins this round of Age of Empires. Quickly, he's probably low on resources right now.
My work here is dung.
So the one instuction is essentially a move command that has multiple modes... Ahem. Isn't that cheating? Isn't move considered two instructions already, a load and store? I guess this is really dependent upon how you define what is and isn't an instruction.
http://www.beanleafpress.com
Programming in Assembly for this architecture is certainly delightful!!!
`echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
I vote for GOTO as the only instruction.
That would be hilarious.
Cheers
Lost at C:>. Found at C.
I built a single instruction microprocessor at grad school. The only instruction was to move a 32-bit data from one address to another address. All the ALU and I/O functions were memory mapped. For example, you could have an adder where address A was operand #1, address B was operand #2 and address C was the result. Branches were handled through ALU units where the result of the operation changed the instruction pointer for some future instruction. It was very easy to implement and notoriously difficult to program.
It's XC3S1000, not XS3C1000. Been working with these parts too long...
Sounds a hell of a lot like the read/write head of the Turing Machine to me.
Why, DWIW (Do What I Want), of course.
... whose first operand is the task to perform. Followed by the necessary operands for that task.
It occurs to me that the Microblaze would be 10 times faster, much easier to program and probably of a similar size.
Mike Albaugh did this in 1986
Google for
urisc macro package in the net.arch archives.
His instruction was "Reverse subtract and skip if borrow"
That would be the system with no instructions at all!
The instruction is "What is 7 times 9"
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
I remember hearing about building a one instruction computer back in engineering school. The one I heard about was based on Subtract and Branch if Not Equal. My roommate at the time figured it ought to be a way to get a very high clock rate. It seems like he found a proof in a hoary old book that such a computer was in fact Turing complete. I'm sure I'll get flamed for posting a vague recollection but. . . here it is.
AA A AA AAAA A AAA AA A A AA A A AAA A A AAAA AAA AAAA
"The advantages of RISC are well known -- simplifying the CPU core by reducing the complexity of the instruction set allows faster speeds, more registers, and pipelining to provide the appearance of single-cycle execution."
Is it just me, or does this sound like RISC fanboyism from the 1990s? The "advantages" of RISC are not nearly so clear these days. Indeed, it is getting rather hard to find real RISC chips. While there are chips based on RISC ISA idea (like being load/store and such), they are not RISC. RISC is about having few instructions and instructions that are simple and only do one thing. Those concepts are pretty much thrown out when you start having SIMD units on the chip and such.
These days complex processors are the norm. They have special instructions for special things and that seems to work well. RISC is just not very common, even in systems with a RISC heritage.
I'm just not seeing what this processor is supposed to accomplish, especially being on an FPGA. If you can implement a CPU to do what you need on an FPGA, you can probably implement a dedicated solution on the FPGA that is faster. That is rather the idea of an FPGA over a CPU. You can implement things in hardware that are faster.
Link to the Print version--article on one page with no advertising since I haven't seen that posted yet.
Is there an optional one-way infinite tape drive?
That's an old idea. The classic "one instruction" is "subtract, store, and branch if negative". This works, but the instructions are rather big, since each has both an operand address and a branch address.
Once you have your one instruction, you need a macroassembler, because you're going to be generating long code sequences for simple operations like "call". Then you write the subroutine library, for shifting, multiplication, division, etc.
It's a lose on performance. It's a lose on code density. And the guy needed a 1,000,000 gate FPGA to implement it, which is huge for what he's doing. Chuck Moore's original Forth chip, from 1985 had less than 4,000 gates, and delivered good performance, with one Forth word executed per clock.
"The advantages of RISC are well known — simplifying the CPU core by reducing the complexity of the instruction set allows faster speeds, more registers, and pipelining to provide the appearance of single-cycle execution." I know this has been argued to death already - but it just isn't completely true that a RISC has advantages over a CISC. The gain in speed is usually negated by the lack of expressiveness and the number of registers would help a CISC just as much as a RISC. Why is this being dragged up again?
I think you can get it to compute anything - if you have enough of them.
Nullius in verba
The hyphen being so everyone doesn't call it "The O-need-er", as in That Thing You Do.
All the FPGA vendors have their own embedded CPU cores, such as the Xilinx Microblaze and the Altera NIOS II which are very small FPGA-embeddable CPU cores.
You also have free options that aren't tied to specific FPGAs like the LEON sparc-compatible processors.
Test your net with Netalyzr
We'll see lots of joke replies here. Computer Science is more concerned with O() notation--they have enough problems wrapping their heads around something like floating point numbers.
The idea of offloading software functions onto custom hardware built around a TTA is interesting - 5 years ago I used to work for Critical Blue who were writing software to design and build those custom processors and optimise an ISA for them. Worth a look.
Compile error. Instruction "A" missing after "A".
Reminds me of this old saying,
"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work."
I just wish I knew who came up with it.
The Oneder(pronounced oneeder). Woops, joke lost on geek crowd that probably never saw the movie.
nop
There can be different architectures for computers, but, nowadays, for many of us, I'd say there is one particular model of an architecture that is likely to be the only one we're really familiar with, and that automatically comes to mind when one speaks of a computer architecture. It's a rather compartmentalized architecture in which the CPU is the place where opcodes are executed and memory is just a big flat address space for data, including instructions. This "transfer triggered" architecture strikes me as being not so much a 1 instruction computer as one where instructions are implemented in a less compartmentalized fashion, spread out among special units activated by addresses, as opposed to the more plain architecture where bit patterns on the address bus simply activate individual generic memory cells along with a read/write signal. More than that may happen, cache memory comes into play with all it's complications for instance, but the 'model' for the programmer is that simple one.
In theory, theory and practice are the same; in practice they're different. (Yogi Berra & A. Einstein)
If you read all the way through he talks about how the bus-like architecture would let you reconfigure the CPU although he doesn't have any tools for that. So you could scan your program, decide on an optimal architecture for it (I need 4 accumulators, 2 stacks, and 4 floating point units) and then compile a program just for that "new" CPU. You could do that with other schemes, I guess, but it becomes hard becuase the data path is usually pretty much wired one way. This is just a bus.
Also, it looks like it would be nothing to add new "instructions" just by plugging relatively simple boxes onto the bus.
There are a couple of old CPUs that did this (Burroughs maybe? I forget).
AH the ONEDERS! - didn't that band have to change their name to the wonders? silly movies.... GO ONEDERS! pronounced (oh-knee-ders)
The point was supposed to be speed. So, he gets 10 MHz? That's not very impressive...
Is this not a state machine design? A network switch ought to implement this sort of design. In fact, the design of TCP would also provide functionality for parallel processing, multiple cores, etc. That would make for a variable word size, too. The work would be in the implementation of the various functions such as add, subtract, etc.
Best regards.
Isn't it cheating to have a CPU with one instruction that relies on custom hardware to do the rest of the instructions? You're just re-defining the CPU and adding more hardware to 'simplify' the CPU!
HexaByte - he's a square and a half!
2009: one million gates, one instruction, RISC, gnarly to program = 10 MIPS.
1984: 200,000 gates, gobs of instructions, CISC, easy to program = 10 MIPS.
We should have more to show for the last twenty-five years in microprocessor design.
Am I part of the core demographic for Swedish Fish?
What what?
The language of naughty schoolboys was goto-only. However, it never fulfilled on its promise of naked chicks if you turned to page 69. Some of the programs written in said language were, however, quite humerous and complex. You could implement loops in that language of course, and perhaps even keep an idiot busy for hours. I'm not sure if it was Turing complete though.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Which just goes to show how shockingly ignorant 'many of us' are.
Now if 'many of us' (brighter than the norm, or so the theory goes) can be so ignorant - why do we laugh at Joe Sixpack?
Press the key to continue.
But, err, there are no instructions for it to rule. Oh well.
... the OS only had one instruction - "PIP" ("Peripheral Interchange Program").
It seems to me like all they're trying to do is reduce risc.
All this talk about 13th Base makes me jealous, 'cause I've never even got to 2nd Base yet. I'll have to die first and go to heaven before I'll get to 13th Base with a chick.
I invented this and published it more than 30 years ago, during the early debate between CISC and RISC microprocessors. It was in the (now defunct) "Modern Data" magazine, in my column "Carol's Microcosm." It's an obvious solution for any computer programmer who understands hardware logic.
There's an old saying:
"Every program can be reduced by one instruction, and every program has at least one bug.
Therefore every programme can be reduced to one instruction, which is wrong."
Seems like we've just taken a big step towards proving that...
wonder how many addressing modes there are...
This is my sig.
1) "KILL ALL HUMANS!"
All Boolean logic functions can be implemented with just a NOR or NAND. Add a conditional branch and a move (ok- more than one instruction now) and you have a general purpose computer.
The only valid program is a single HALT instruction.
These posts express my own personal views, not those of my employer
My 32-bit one command machine just needs Start. It's that big button on the bottom left. Been around since '95. There's a even a Stones song for it!
... is it pronounced "O-knee-der" or "O-ned-der"?
You know, every time it does that thing it does.
I can't find his writeup anymore, but isn't this essentially Dave Taylor's 'pinky processor'?
http://tech.slashdot.org/article.pl?sid=98/06/06/136239
I remember reading on a Patterson book that the ultimate instruction was "Substract and Jump if zero". Everything could be implemented using that instruction alone.
I once did microcoding on an Array Processor back almost 30 years ago which worked with the UNIVAC 1100/80 mainframe... it was implemented by DataWest.
Using the 1100 Meta-Assembler we wrote 288bit long instructions that handled moves through the various bits of the machine with 72bit per slice (there were 4 slices).
And, yeah, the clock rate the 100ns/cycle so there was an instruction fetch 10 times a microsecond. This limit was do to being built using TTL...
And to think that this machine had a theoretical peak throughput of 120MFlops... and, I recall, could sustain 80 MFlops.
So, yeah, this could happen.
Ah, and here's the programmer's manual: clicky
Erm, you know that Memory Mapped devices are in every day use right? That the x86 CPU exports things like the APIC through memory mapping even though it's built-in to the CPU?
If you have a video card, the card's control registers are memory mapped more likely then not.
The real problem with alternative CPU architectures is the x86 arch has very good memory consistency, the out-of-order execution units block processing that read results which haven't been calculated yet whereas CPUs like the Alpha do not; this is just great in that you can read 2 numbers, add them, save to memory then read the memory and the read operation may actually return whatever was in the RAM before the add due to deciding that was the easiest operation to do first.
...if the one instruction is NOP. He could easily crack the petanop barrier.
Of course, that's true about just about everything. Back in the 80's, I heard of this being referred to as a MISC processor (minimal instruction set computer).
Of course, it's cool that this guy actually BUILT one. :)
... to rule them all!
A cousin of mine (Howdy Rusty!) described this concept to me in the '70s while I was taking classes toward my CS degree.
A little background: I went to the good old University of Utah which had a Boroughs 1700 with user writable microcode and so a lot of project centered around writing microcode and designing micro architectures. A friend was trying to code up a single instruction machine based on Curry Combinators. I thought he was nuts, but I liked the idea of a single instruction machine. So, I was talking to my cousin and he described an architecture that had one instruction that was a source and a destination address. Any address could be either memory or a register in a functional unit, an FU for short. No kidding, that is how he described it.
The only trouble was trying to figure out how to do a conditional branch.
A few years later while I was in gradual school I solved that problem and wrote paper about it. Being a gradual student I could not publish without permission from my adviser. Well, he got a good laugh out of the idea and told me not to show it to anyone. So, of course I sent it to everyone I knew. They all had a good laugh to. Said it was the funniest thing I had ever written. You see, I was into writing humorous stories at the time and people thought this was another one. Oh well, I have a print out of the thing around here somewhere.
What I really liked about the architecture is that if you started modifying it to make it more economical, doing things like making the addresses have different lengths and adding a bit to tell you if the long address is the source or the destination, the move architecture starts looking more and more like a classic instruction set architecture. I thought that was very cool. When you look at micro coded architectures and think about a pure move based processor it really does look like all traditional architectures are attempts to make the one instruction machine make more economical use of instruction bits.
So, how did I solve the conditional branch problem? Pretty much the way this fellow did. Every FU may, or may not, cause condition flags to be set. I added registers where you could read and write the condition bits and read and write the program counter. I also added a mask register that was anded with the condition register so you could enable and disable conditions. Then I just made the current instruction conditional on the values of the flags register anded with the mask register. If the result was non-zero the current instruction was skipped. Of course, the machine had to clear the condition register after each instruction was executed. (Hmm, it would make more sense to only make moves to the program counter conditional and it would make more sense to only clear the flags after a move to the instructions counter... Hey was a gradual student back then! :) That approach allowed you to select say the sign bit from one ALU, do an subtraction by moving values to two registers in the ALU, then jump if the sign bit is set. It also let you directly make any instruction conditional so you could implement something like the ABS() function without any jumps. Or, at least that was the idea.
I called my one instruction: The Conditional Move From Here To There And Clear Flags, or TCMFHTTACF insturction. The assembly for it was really dull, it just always had the same op code down the left hand edge of the screen... Ok, really, I just never listed anything but addresses when I wrote code for it.
Nice to see that someone actually built one of these. BTW, this kind of architecture makes it easy to add multiple execution units. With parallel execution and careful use of shared and private FUs and memories you can build a pretty damn powerful special purpose processor without a lot of hardware complexity.
This just to damn cool... someone finally built it!
Stonewolf
Even a child can create a one-instruction computer, as long as the instruction is "nop."
x86 is with us because of backwards compatibility. even Intel were unable to shrug it off with Itanium and various other things.
x86 is still with us because is-gross turned out to be 20% is-gross and 80% with-gross. The 20% that actually is-gross has been a minor cross to bear, the other 80% was relegated to traps, microcode, and emulation. The most ridiculous CISC instruction from 1980 is a pimple on a bedbug in silicon area thirty years later. Moore's law: the amazing zit shrinking cream.
you almost need a different compiler for each generation of CPUs
If your compiler doesn't work well on a 486, it's badly broken. Since then, there have been two different approaches by Intel which annoy the compiler gods: the Pentium and Pentium IV which place a premium on low level instruction scheduling, and everything else, starting with the Pentium Pro and including the Core Duo, all non-deterministic data-flow architectures at heart.
The main differences in a good Pentium Pro compiler was a few hazard-aware instruction order tweaks, mostly focused on the complex/simple/simple instruction decode architecture. Hand tweaking for the Pentium Pro did not offer as much as with other architectures. It was hard to gain complete control for cycle precise scheduling, and the OOO logic did a good job of mitigating dependency chains on the fly: you neither had a large problem to solve, nor much control in solving it.
There's a rumour the trace cache is making a reappearance in Sandy Bridge, so perhaps the pendulum is swinging back to the Pentium/Pentium IV side of the fence.
A long time ago I read some long papers on TTA, around the time Intel went the wrong direction with Itanium (defining bundles as a unit of independent instructions, rather than bundles as units of dependent instructions).
What makes TTA interesting is having many buses, with as many buses utilized on each clock cycle as possible. This guy has not invented an instruction set. He has invented a microcode engine. In doing so, he's muddied the notion of processor state, so there's no abstraction for handling interrupts. The great thing on an FPGA is that you can program around the need for interrupts, if you can devote a small core to each concurrent task.
Real microcode instructions tend to have very long bit vectors, so that multiple buses can be coordinated on the same clock cycles. If you aren't trying to throw maximal resources at a single, dominant task, you can instead have many concurrent execution engines, each with a single function unit bus. This works for some applications.
My feeling about Itanium is that it should have allowed instruction clusters such as complex multiply in a single bundle.
r = ac - bd
i = ad + bc
This requires four inputs from the register file, two outputs to the register file, four multiplications, and two additions. You can find many examples in TAOCP V4F1 of small instructions clusters of this nature. A single eight byte bundle will be hard pressed to encode six arbitrary registers from a 256 register set, but I would argue that you don't need to. Compilers are extremely clever at register colouring, so a clever subset of full generality would prove more than adequate. Hint: invent the compiler and prove this, before committing the design to silicon.
From a TTA perspective, such a bundle achieves six operations at the expense of just four reads and two writes to the shared register file, with some intermediate results briefly shunted on local sidings. Managing the local sidings introduces some non-determinism from the perspective of the compiler, but nowhere near the scope of OOO shunting overhead in the Pentium Pro.
I think the Itanium design fell victim to ATM logic: determinism at the expense of higher aggregate throughput in the common case. That bet rarely pays off. They tricked themselves into believing they could bet against the grain by shuffling the downside of this fictio
Would anyone like any toast?
I'm confused; my computer doesn't have a key on it, only a wheel and a button!
I tried taking a key out of my pocket and pushing it against the wheel, but that didn't seem to anything but scratch it! :-/
BTW, this kind of architecture makes it easy to add multiple execution units. With parallel execution and careful use of shared and private FUs and memories you can build a pretty damn powerful special purpose processor without a lot of hardware complexity.
All the execution units (aka co-processors in modern parlance) are still attached to a single bus, so theoretical max throughput is still one instruction per cycle. So this only makes sense if the CPs perform complex operations - like memory management, floating point, mul/div - or something of similar complexity. For the typical simple integer instruction that tends to dominate code it's no better than a microcoded processor - since now each normal instruction requires several cycles on the bus.
Compare this to a pipeline, where each step is what you'd consider a FU, but each one only needs to interface to the previous one, not counting the occasional clever bypass and special linkage. In a pipeline each FU can be fed an input in one cycle and provide an output in the next, something "FU"s (CPs actually) on a shared bus can't.
The other drawback is that anything that goes on the bus needs to implement a generic bus interface and its own internal sequencing logic. This doesn't come free. This is really just a CISC in silicon with exposed microcode, and it's pretty clear the author is thinking CISC all the way as can be witnessed by the stack operations. The typical RISC approach is to have a link register where the return address is placed on subroutine calls, so leaf functions don't have to push/pop it from the stack. Non-leaf functions start by saving the link register to the stack frame in the function prologue once space has been allocated for it.
Press the key to continue.
Ok, Steve Jobs has really gone too far with the minimalist controls this time.
don't you mean "Press 'A' key to continue"? ;)
Sometimes I wonder if I think too much.
- RISC architecture is gonna change everything.
- Yeah. RISC is good.
QED
http://www.accountkiller.com/removal-requested
msg db 'Hello, world!$'
start:
simon_says mov ah, 09h
simon_says lea dx, msg
simon_says int 21h
simon_says mov ax,4C00h
simon_says int 21h
end start
(source: wikipedia)
It seems to me that a transfer oriented architecture is conceptually very easy to parallelize.
Bruce Perens.
Actually the bus interface is a tristate buffer and a decoder. Not free, but cheap especially on modern fabric. As for CISC thinking, the RISC text is emphasizing the one instruction not that the actual "derived" ISA is RISC-like. The advantage to this is you can plug in different FUs (or CPs if you rather) without mucking with the processor per se.
That would be great for running Windows; all you need is one instruction:
HLT.
My programs have just one instruction -- ESM (Execute State Machine). The instruction is exactly the length of my program.
All this does is move the complexity of the instruction decoding from the processor to the address decoders.
Damn guys, mod this up.
I've been mucking about with the idea of building a cpu from logic gates for a while now, and I might have to use a "single instruction" architecture. The registers and inputs and outputs of small function units would all be addressable memory. Technically you could argue that some addresses have instruction meanings as a result, but what the hell. A program would be along the lines of:
Move from constant #000 (could be memory loc #000) to increment input 0 (say memory loc #002)
Move from memory loc #100 (a real memory location) to add unit input 0 (say memory loc #006)
Move from memory loc #101 to add unit input 1 (say memory loc #007)
Move from add unit output 0 (say memory loc #008) to memory loc #101
Move from increment output 0 (say memory loc #003) to increment input 0 (loc #002)
and then a conditional jump to the top on the increment output reaching a value, which I figure would have to be handled by a subtract and a "conditional unit", with the conditional unit returning one input or another depending on yet another input (fed from the subtract), which could then be moved into the PC. Yes, this technically means it does a non-conditional jump to one of two addresses depending on a condition, not a "conditional jump", but that's close enough.
It would be easy to build due to being so incredibly modular. You could even make it execute two "instructions" per clock or more to make some more traditional instructions take only a single clock cycle, instead of a cycle per input and output.
I liked the idea and tried doing a design of my own. The thing I didn't like was that you now split up an operation into multiple instructions which couldn't operate concurrently, and I couldn't see how that could be sped up given instruction bus speed limits.
What I figured was to make the functional units more complex, so instead of having two inputs (left and right operand, implicit function), they'd also take an op code. This meant that I could reduce the number of addresses enough that a single move instruction could be packed into one byte. I don't recall for sure, but I think I used two bits to indicate the bus the move operated on, so you could get three moves happening at once (I think the last 2-bit pattern was reserved for special operations, but I don't recall what they were).
Branches were straightforward, in that the instruction read unit was just another functional unit, with left, right, and op input, you could just transfer the output from a logic/comparator unit to the op input of the instruction read unit to jump to the new (relative, I think) address or not.
Constants were defined by a special instruction unit operation which would accumulate 1, 2, or 4 subsequent bytes into the output register, ready to be moved elsewhere (as well as regular load/store from memory).
There was also a dedicated register file, where the op code was the register to read/write. Just in case the functional unit input/output registers weren't adequate.
I liked this idea because there'd be no speed penalty - in fact, a typical "regular" instruction would only be 3 bytes, so with the same input bottleneck it could even be faster.
It didn't get beyond a high level block diagram and instruction/unit descriptions. I'm sure I have a copy of it somewhere, but it got lost in a move (ironically).
Marklar: "You see, young marklar. Those marklars don't care about marklar marklar. They just want to take your marklar and marklar their own marklar. The only marklar for this is to marklar."
It must have been something you assimilated. . . .
BTW, this kind of architecture makes it easy to add multiple execution units. With parallel execution and careful use of shared and private FUs and memories you can build a pretty damn powerful special purpose processor without a lot of hardware complexity.
All the execution units (aka co-processors in modern parlance) are still attached to a single bus, so theoretical max throughput is still one instruction per cycle. So this only makes sense if the CPs perform complex operations - like memory management, floating point, mul/div - or something of similar complexity. For the typical simple integer instruction that tends to dominate code it's no better than a microcoded processor - since now each normal instruction requires several cycles on the bus.
I see how you come to that conclusion. But, it is false. No reason why the inputs and outputs can't be on separate buses and no reason why you can't have separate machines communicating over yet another bus or set of buses to build a pipeline.
One version I found in my notes use a one instruction machine to implement the microcode for another machines. Like I said, my version was inspired by microcode so it *should* look like microcode.
Anyway, you are wrong, you're just assuming a limitation that isn't there.
Stonewolf.
I see how you come to that conclusion. But, it is false. No reason why the inputs and outputs can't be on separate buses and no reason why you can't have separate machines communicating over yet another bus or set of buses to build a pipeline.
Ah yes, that would be a pipelined design. I think that would work more efficiently than the original DDJ story's.
One version I found in my notes use a one instruction machine to implement the microcode for another machines.
I think too much significance is place on the "single instruction" aspect; the DDJ design is really "single instruction format". It clearly has multiple operations, and the destination address is more or less an opcode field. So it's more accurate to describe it as a single-instruction format machine. Each operation has its own dedicated destination register which can be used as a source for operations.
Of course it has multiple operations. But in fact it has a single instruction. The CPU's control unit treats every instruction the same. Now if you want to nit pick, yes the immediate load format makes it really a 2 or 3 instruction machine. But your argument is confusing operation with instruction. // do zero stuff
Think of an analogy of a C program. If I write:
void foo(int cmd) {
switch (cmd) {
case 0:
break;
. . .
Do I have one subroutine or many? I have one subroutine. The fact that it carries out multiple operations is not relevant to the count of the subroutines even though I could certainly write "n" subroutines foo0, foo1, foo2, etc.
http://www.youtube.com/watch?v=YYdpRV29_Y8 - Actual 2nd gen hardware
http://www.youtube.com/watch?v=KnQGcoe6oGg - Software simulator
Truly, what were you thinking, that people were going to understand the architecture and comment in a coherent manner? :)
I was wondering if this could be implemented with network switches and TCP packets.
Best regards.
The problem with social networking is that society, as an aggregate, sucks :-)
Bruce Perens.
Funny how things crop up when books have been written about the topic. There is a computer architecture book "Computer Architecture: A Minimalist Perspective" that examines computer architecture using the one instruction.
For those curious or interested the authors' website is http://www.caamp.info. As Mr. Spock says "Fascinating..." or perhaps "Pure energy" instead. It does take the intellectual exercise to a more in depth look. It seems OISC and RISC are the "minimalist philosophy" of computer architecture, that also pervades other areas of music, art, engineering--the "less is more" outlook. Whether its good or bad, order or chaos, franks or beans seems to remain a controversial topic for much heated debate.
Cheers, be well, and may your code compile...
Moses C. Kery