RISC vs. CISC in the post-RISC era

← Back to Stories (view on slashdot.org)

RISC vs. CISC in the post-RISC era

Posted by Hemos on Thursday October 21, 1999 @12:47AM from the chip-differential dept.

S. Casey writes "Ars Technica has a very cool article up that takes on the typical RISC versus CISC debate. The author argues that today's microprocessors aren't RISC or CISC, really, and covers the historical/technical reasons why these two distinctions aren't particularly useful anymore. It's pretty convincing (to me). " Essentially, the author argues that it is difficult, if not impossible, to have the normal debate because both chipsets have evolved features that used to be found in the other chipset.

3 of 119 comments (clear)

Min score:

Reason:

Sort:

Comments by bhurt · 1999-10-20 22:54 · Score: 4

First of all, FP doesn't add all that many instructions to an architecture. The alpha has about 6 FP instructions- load fp, store fp, add, subtract, multiply, divide. The PPC (ingoring SIMD) takes this up to about a dozen or so- two of which have as their sole purpose in life making sin() etc. fast to implement (three instructions, vr.s a dozen or so on the Alpha).

Second, there are two _different_ optimization problems chip designers face, generally at different times. The least common optimization is the "clean slate design"- where the chip designers don't have to support anything, and can draw the boundaries wherever they make sense to be drawn. In essence this what what the RISC designers of the eigthies did. The other optimization problem the chip designers are handed an architecture and a set of existing programs and told "make them go faster".

Super-scalar Out of Order execution, branch predicition, more functional units, and speculative code execution are all optimizations you can apply to an existing architecture *without changing the (apparent) semantics of that architecture*- i.e. without breaking legacy code. New instructions allow "new" or recompiled code to gain a performance boost without dropping support for old code (SIMD and DSP-like instructions just happen to be all the rage these days). So of course they're applied to both legacy RISC and legacy CISC applications!

Of course these "patches" are not as effective as fundamentally rearchitecting the CPU. Of course they increase the complexity of the CPU in much greater proportion than they increase performance. This doesn't imply some "ideological impurity", however- this is the fundamental problem of supporting legacy code. This articles thesis boils down to "there are only legacy CPUs out there!". Which, for the moment is true.

But let's consider for a moment what a rearchitected CPU for today would look like. What we'd like to do is to continue the trend RISC started- of shoving the complexity off the CPU and onto the compiler. It would be sort of accurate to claim that RISC's central idea was to shove the complexity of the translation to microcode onto the compiler.

Today's CPU complexity comes primarily from the patches applied to make the legacy code run faster- especially superscalar execution, branch prediction, and speculative execution- all of which require the CPU to deduce information out of sets of instructions. It'd be nice to have the compiler _tell_ the CPU the data ahead of time, so the CPU wouldn't have to spend precious clock time and transistor budget deducing. This, of course, implies a method for explicitly communicating this information in the instruction stream (the only channel of information between the compiler and the CPU)- older instruction sets (of all stripes) forced the CPU to deduce this information because there was no channel in the instruction stream for communicating it.

If this is begining to sound like the Itanium, you're right. Wether this is the right way to go, only time will tell (and, on advice on time's lawyers, time has no statement to make at this point).
The Coming XISC Evo/Revolution by Effugas · 1999-10-20 23:36 · Score: 4

Before I say anything, I want to commend Hannibal on an absolutely excellent article that clarified issues I thought I understood and illuminated much of the technological history behind the technology we each use every day.

I am completely impressed.

That being said, I'd like to take a moment and theorize on the direction microprocessor design is likely to go. This is my theory; you're welcome to disagree and in fact eagerly await commentary from those far deeper in the industry than I. Insert Slashdot Self-Correcting Nature here.

Of all the chasms in the computer world, there are few as vast as the speed differential between general purpose processors programmed to execute a given task and hard-coded ASICs(Application Specific Integrated Circuits) designed to meet the functional needs of a given process. (OK, granted, Internet -> Local Network -> Hard Drive -> System Memory -> Processor Cache -> Processor Registers is pretty vast too, but cut me some slack here.)

Telephony is a joke without ASICs--I haven't found a voice over IP solution that operates in software well enough to even be used as a room to room intercom over a 100BaseT Lan--but it's actually reasonably lag-free with hardware encoding.

Similarly, huge banks of boxen rendering frames for movies became significantly less impressive to me when I realized how many banks of Pentium Processors it would take to match, say, a single Voodoo 2. While, in recent times 3D Rendering has gotten shots in the arm on the general purpose x86 architecture via both MMX and KNI, the order of magnitude difference in speed makes CPU rendering of realtime 3D graphics almost useless.

(Then again, Sumea is probably the single coolest thing I've done with Java, short of Mindterm.)

As I observed in the Amiga newsgroup, shove a couple of custom ASICs in a box and you can run a highly competitive multitasking OS in 512K of RAM, with unmatched graphical support to boot.

But ASICs have their limitations--while they're fast at what they do, they're extremely inflexible. You can't merely program in a new transparency algorithm, nor implement Depth of Field in an architecture that totally lacks it. The inflexibility of ASICs dooms their long term viability.

CPU's are flexible but slow, ASICs are inflexible but fast. It's a dichotomy the industry is on the verge of smashing.

I dub the coming processor design specificiation(which, as the article correctly noted, is all RISC/CISC really are) XISC, for eXtensible Instruction Set Computing. XISC essentially specifies that the underlying computational structures--be they microcode or raw gate arrays--ought to be dynamically reconfigurable to meet the needs of the process.

Just as the lack of a quick bilinear filter function(SIMD stuff) on older Intel chips doomed them as far as efficient 3D in relation to customized ASICs, the ability to insert such a command directly into the internal microcode of a processor has a theoretical chance of executing at extremely high speeds for a non-dedicated processor.

Transmeta, also known as the only reason many people willingly acknowledge the US Patent Office, appears to be spearheading the XISC drive. Their patents refer to technologies that automatically cache microcode translations, that provide backwards-flow in case of a broken emulate, and so on. They've often been "accused" of developing a chip that can emulate any chip--in the XISC context, a chip optimized to execute the instruction set most required by any given process.

If you accept that performance drops in the orders of magnitude are suffered when a processor lacks the appropriate design for a given set of requests, it's quite obvious that intelligent designers seeking to execute a quantum leap in system performance would try to allow processors to acquire any necessary designs to achieve much higher speeds.

Of course, most of my chip designer friends would be happy to remind me that much of the speed of ASICs comes from their hard coded nature--the literal gates correspond to whatever output is desired, no translation is necessary.

Of course, here's where FPGA's come in. Field Programmable Gate Arrays are chips whose internal gate structure can be rewritten on command, sometimes many thousands of time per second. They can't be clocked as fast as true ASICs, nor are the yields as high, but one quickly morphing chip can do the job of three or four in a digital camera. With at least one company(someone give me a name!) developing a language for programmatically defining instruction sets for a FPGA processor, the technology for XISC is obviously in development.

Ah, but not all is not fair thee well. In fact, while on the topic of 3D chips, the Rendition Verite chipset had a programmable RISC core, and the chip ended up failing because it could not scale in speed like 3DFX's Voodoo could. Developers could write new 3D instructions, but didn't (in general) because it was just too hard. (Yes, Carmack did.)

That's why there's such a powerful force towards automation in this XISC evo/revolution, such as the FPGA language and Transmeta's automated Microcode translations that stay in memory so as to speed up future similar instruction requests. In an ideal world, a developer merely compiles a chunk of code that profiles as heavy usage directly into CPU microcode, or at least specifies in some way that a given routine ought to be run through the "special ops" part of the system.

Whether the world will become ideal is a point of question. Whether we will have instruction sets that morph is almost obvious, it's just a matter of when will the bridge between ASICs and CPU's finally be resolved.

Yours Truly,

Dan Kaminsky
DoxPara Research
http://www.doxpara.com
The author of this article is clueless. by anonymous+loser · 1999-10-20 21:00 · Score: 4

His main evidence is a quote wherein Ditzel is quoted as saying, "Superscalar and out-of-order execution are the biggest problem areas that have impeded performance [leaps]." Obviously the author has absolutely no knowledge of how processors work internally, or he wouldn't say that this is due to the complexity of the ISA (Instruction Set Architecture).

The complexity with superscalars is not in the ISA, but in the scheduling. At the most basic level, though, RISC instructions are used because it is (effectively) impossible to schedule CISC instructions for out-of-order execution.

The whole idea with RISC is to make instructions so basic that they can (almost) all be completed in a single processor cycle. In the article, he tries to refute this with a quote from Patterson, but the quote actually refutes the author's point, and the author is too blind to realize it. Twice in the quote Patterson refers to reducing the cycle time for each instruction, but the author says that's not Patterson's point.

Today's processors take the idea a step further, by trying to execute MORE than one instruction per cycle by providing multiple processing units (the thing that does the actual addidtion, subtraction, or whatever) which can execute instructions in parallel. However, instructions still need to be scheduled so that they can execute in parallel while preserving dependencies.
The hardware that accomplishes this scheduling is complex.

IMHO VLIW is the way to go. With VLIW, you do the scheduling at compile time, and remove a lot of the complexity involved with hardware scheduling. Not only do you gain the possiblity of higher parallelism through an increased number of processing units (you can use the silicon previously reserved for the scheduling hardware), but you also can gain a little more since theoretically a complier can spend more time looking for dependencies between instructions, and come up with a more optimal schedule.

anyway, that's just my 2 cents.