Slashdot Mirror


Research Shows RISC vs. CISC Doesn't Matter

fsterman writes The power advantages brought by the RISC instruction sets used in Power and ARM chips is often pitted against the X86's efficiencies of scale. It's difficult to assess how much the difference between instruction sets matter because teasing out the theoretical efficiency of an ISA from the proficiency of a chip's design team, technical expertise of its manufacturer, and support for architecture-specific optimizations in compilers is nearly impossible . However, new research examining the performance of a variety of ARM, MIPS, and X86 processors gives weight to Intel's conclusion: the benefits of a given ISA to the power envelope of a chip are minute.

36 of 161 comments (clear)

  1. isn't x86 RISC by now? by alen · · Score: 5, Informative

    i've read the legacy x86 instructions were virtualized in the CPU a long time ago and modern intel processors are effectively RISC that translate to x86 in the CPU

    1. Re:isn't x86 RISC by now? by Z80a · · Score: 3, Interesting

      As far i'm aware since the pentium pro line, the intel CPUs are RISCs with translation layers and AMD been on this boat since the original athlon.

    2. Re:isn't x86 RISC by now? by Anonymous Coward · · Score: 2, Interesting

      x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.

    3. Re:isn't x86 RISC by now? by cheesybagel · · Score: 2

      Actually AMD did that way back in the K5 time. The K5 was a 29k RISC processor with a x86 frontend.

    4. Re:isn't x86 RISC by now? by bill_mcgonigle · · Score: 2

      I have to assume the wisc.edu folks know this and somebody gummed up the headlines along the way.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    5. Re:isn't x86 RISC by now? by drinkypoo · · Score: 4, Interesting

      That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    6. Re:isn't x86 RISC by now? by Anonymous Coward · · Score: 4, Insightful

      Yes. As noted by the study (That by the way isn't very good.) "When every transistor counts, then every instruction, clock cycle, memory access, and cache level must be carefully budgeted, and the simple design tenets of RISC become advantageous once again."
      Essentially meaning that "If you want as few transistors as possible it doesn't help to have the CISC to RISC translation layer in x86"

      They also claim things like "The report notes that in certain, extremely specific cases where die sizes must be 1-2mm2 or power consumption is specced to sub-milliwatt levels, RISC microcontrollers can still have an advantage over their CISC brethren." which clearly indicates that their idea of "embedded" systems is limited to smartphones.
      The cases where you have a battery that can't be recharged on daily basis is hardly an extremely specific case. Not that any CPU they tested is suitable for those applications anyway. They have essentially limited themselves to applications where "not as bad as P4" is acceptable.

    7. Re:isn't x86 RISC by now? by RabidReindeer · · Score: 4, Interesting

      x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.

      They're not the only ones. The IBM mainframes have long been VMs implemented on top of various microcode platforms. In fact, one of the original uses of the 8-inch floppy disk was to hold the VM that would be loaded up during the Initial Microprogram Load (IMPL), before the IPL (boot) of the actual OS. So in a sense, the Project Hercules mainframe emulator is just repeating history.

      Nor were they unusual. In school I worked with a minicomputer which not only was a VM on top of microcode, but you could extend the VM by programming to the microcode yourself.

      The main differences between RISC and CISC, as I recall were lots of registers and the simplicity of the instruction set. Both the Intel and zSeries CISC instruction sets have lots of registers, though. So the main difference between RISC and CISC would be that you could - in theory - optimize "between" the CISC instructions if you coded RISC instead.

      Presumably somebody tried this, but didn't get benefits worth shouting about.

      Incidentally, the CISC instruction set of the more recent IBM z machines includes entire C stdlib functions such as strcpy in a single machine-language instruction.

    8. Re:isn't x86 RISC by now? by cheesybagel · · Score: 5, Informative

      After AMD lost the license to manufacture Intel i486 processors, together with other people, they were forced to design their own chip from the ground up. So they basically used one of the 29k RISC processors and put an x86 frontend on it. Cyrix did more or less the same thing at the time also coming with their own design. Since the K5 had good performance per clock but could not clock very high and was expensive AMD was stuck and to get their next processor they bought a company called NexGen which designed the Nx586 processor which was Intel compatible. AMD then worked on the successor of Nx586 as a single chip which was the K6. The K7 Athlon was yet another design made by a team headed by Dirk Meyer who used to be a chip designer at Digital Equipment Incorporated i.e. DEC. He was one of the designers of the Alpha series of RISC CPUs and the Athlon resembles an Alpha chip internally a lot because of that.

    9. Re: isn't x86 RISC by now? by the_humeister · · Score: 2

      Actually, Nexgen was the first to do x86 -> RISC. Then Intel with Pentium Pro. Then AMD with K5. As far as I recall, Cyrix never did x86 -> RISC back then until they were acquired by VIA (ie, Cyrix M series chips executed x86 directly, but VIA Epia and later translate).

    10. Re:isn't x86 RISC by now? by funwithBSD · · Score: 2

      When the DEC Alpha was killed, many of the engineers were picked up by AMD.

      --
      Never answer an anonymous letter. - Yogi Berra
    11. Re:isn't x86 RISC by now? by enriquevagu · · Score: 4, Insightful

      This is why we use the terms "Instruction Set Architecture" to define the interface to the (assembler) programmer, and "microarchitecture" to refer to the actual internal implementation. ISA is not bullshit, unless you confuse it with the internal microarchitecture.

    12. Re:isn't x86 RISC by now? by bws111 · · Score: 2

      The very first paragraph of IBMs z/Architecture Principles of Operation:

      The architecture of a system defines its attributes as seen by the programmer, that is, the conceptual structure and functional behavior of the machine, as distinct from the organization of the data flow, the logical design, the physical design, and the performance of any particular implementation. Several dissimilar machine implementations may conform to a single architecture. When the execution of a set of programs on different machine implementations produces the results that are defined by a single architecture, the implementations are considered to be compatible for those programs.

    13. Re:isn't x86 RISC by now? by Guy+Harris · · Score: 3, Informative

      They're not the only ones. The IBM mainframes have long been VMs implemented on top of various microcode platforms.

      But the microcode implemented part or all of an interpreter for the machine code; the instructions weren't translated into directly-executed microcode. (And the System/360 Model 75 did it all in hardware, with no microcode).

      And the "instruction set" for the microcode was often rather close to the hardware, with extremely little in the way of "instruction decoding" of microinstructions, although I think some lower-end machines might have had microinstructions that didn't look too different from a regular instruction set. (Some might have been IBM 801s.)

      So that's not exactly the same thing as what the Pentium Pro and successors, the Nx586, and the AMD K5 and successors, do.

      Currently mainframe processors, however, as far as I know 1) execute most instructions directly in hardware, 2) do so by translating them into micro-ops the same way current x86 processors do, and 3) trap some instructions to "millicode", which is z/Architecture machine code with some processor-dependent special instructions and access to processor-dependent special registers (and, yes, I can hear the word PALcode being shouted in the background...). See, for example, " A high-frequency custom CMOS S/390 microprocessor" (paywalled, but the abstract is free at that link, and mentions millicode) and "IBM zEnterprise 196 microprocessor and cache subsystem" (non-paywalled copy; mentions microoperations). I'm not sure those processors have any of what would normally be thought of as "microcode".

      The midrange System/38 and older ("CISC") AS/400 machines also had an S/360-ish instruction set implemented in microcode. The compilers, however, generated code for an extremely CISCy processor - but that code wasn't interpreted, it was translated into the native instruction set by low-level OS code and executed.

      For legal reasons, the people who wrote the low-level OS code (compiled into the native instruction set) worked for a hardware manager and wrote what was called "vertical microcode" (the microcode that implemented the native instruction set was called "horizontal microcode"). That way, IBM wouldn't have to provide that code to competitors, the way they had to make the IBM mainframe OSes available to plug-compatible manufacturers, as it's not software, it's internal microcode. See "Inside the AS/400" by one of the architects of S/38 and AS/400.

      Current ("RISC") AS/400s^WeServer iSeries^W^WSystem i^WIBM Power Systems running IBM i are similar, but the internal machine language is PowerPC^WPower ISA (with some extensions such as tag bits and decimal-arithmetic assists, present, I think, in recent POWER microprocessors but not documented) rather than the old "IMPI" 360-ish instruction set.

      The main differences between RISC and CISC, as I recall were lots of registers and the simplicity of the instruction set. Both the Intel and zSeries CISC instruction sets have lots of registers, though.

      Depends on which version of the instruction set and your definition of "lots".

      32-bit x86 had 8 registers (many x86 processors used register renaming, but they still had only 8 programmer-visible registers, and not all were as general as one might like), and they only went to 16 registers in x86-64. System/360 had 16 general-purpose registers (much more regular than x86, but that's not setting the bar all that high :-)), and that continues to z/Architecture, althoug

    14. Re: isn't x86 RISC by now? by LostMyBeaver · · Score: 2

      You make a lot of good points. But any time you create a processor with cache miss prediction as well as branch prediction as well as execution parallelization, the instruction set has almost no effect at all.

      You're suggesting that the instruction set interpretation is a separate unit to to out of order execution code. In reality, the first stage of any highly optimized modern CPU able to minimize pipeline misses must actually "recompile the code" (an oversimplification) in hardware before the actual execution unit and any intelligent designer would simplify the operations into much wider RISC style operations in the pipeline(s).

      A purely in-order processor would certainly suffer from using a variable length instruction set.

      What many people fail to realize is that ARM is very much a CISC ISA as well, but with a wider fixed sized word. So instead of increment ing instruction lengths by single byte units, ARM has two byte wide units in Thumb mode and four byte wife increments in ARM mode. Either way, it's nothing like MIPS (which is a dinosaur) which is a pure RISC architecture.

      With the exception of a rare set of ULIW DSP cores (TI), it doesn't really matter. The major performance booster DSPs have is that code for a DSP are generally very small programs with extremely long pre-optimized pipelines which are not intended to be run in a task switching environment. Those processors don't need translation because they're completely pre-optimized. They also don't tend to have memory fragmentation and almost never have MMUs.

      General purpose CPUs run code all ass-backwards because the code is unpredictable. The ISA will almost always benefit most from having as many possible version of instructions as possible for performance per watt.

      The trick will be how to detect when specific units aren't needed and power them down when possible. Compilers and ISA will have no impact on this... Unless instructions are added to power on and off units explicitly based on the code entering the pipeline.

    15. Re: isn't x86 RISC by now? by Darinbob · · Score: 2

      ARM is RISC through and through. Though it complicates it somewhat with multiple simple instruction sets. Basic ARM ISA is all 32-bit instructions only, very much RISC from every angle you look at it, every bit as pure as MIPS. ARM Thumb ISA is 16-bit instructions only, and the machine translation from Thumb to ARM is very simple, just a fraction of the chip. Thumb2 gets slightly more complex allowing both 16 and 32 bit instructions intermixed, but again it's not that complicated. It's just RISC like all around. Some exceptions with the PC-relative indexing perhaps, but all instructions are orthagonal, you can decode the machine code by hand, all instructions take the same amount of time to execute (not true on some multiplies/divides on some models, which was also true for classic RISC), decoding is a tiny fraction of the chip being used.

      MIPS isn't a dinosaur, they still develop and advance it and it's still being sold new for use in new products.

  2. It's a question that WAS relevant by i+kan+reed · · Score: 3, Insightful

    Back when compilers weren't crazy optimized to their target instruction set, people coding things in assembler wanted CISC, and people using compilers wanted RISC.

    But nowadays almost no one still does the former, and the latter uses CISC chips a lot better.

    This is now a question for comp sci history, not engineers.

    1. Re:It's a question that WAS relevant by TWX · · Score: 5, Funny

      You mean, my Github project on Ruby on Rails with node.js plugins isn't optimized to use one versus the other?

      --
      Do not look into laser with remaining eye.
    2. Re:It's a question that WAS relevant by Nyall · · Score: 3, Interesting

      I think a large part of the confusion is that CISC often means accumulator architectures (x86, z80, etc) vs RISC which means general purpose register (ppc, sparc, arm, etc) In between you have variable width RISC like thumb2.

      As an occasional assembly programmer (PowerPC currently) I far prefer these RISC instructions. With x86 (12+ years ago) I would spend far more instructions juggling values into the appropriate registers, then doing the math, then juggling the results out so that more math could be done. With RISC, especially with 32 GPRs, that juggling is near eliminated to the prologue/epilogue. I hear x86 kept taking on more instructions and that AMD64 made it a more GPR like environment.

      -Samuel

      --
      http://en.wikipedia.org/wiki/Jury_nullification
    3. Re:It's a question that WAS relevant by mlts · · Score: 2

      Even though Itanium is all but dead, I did like the fact that you had 128 GP registers to play with. One could do all the loads in one pass, do the calculations, then toss the results back into RAM. The amd64 architecture is a step in the right direction, and I'd say that even though it was considered a stopgap measure at the time, it seems to have been well thought out.

    4. Re:It's a question that WAS relevant by i+kan+reed · · Score: 2

      Oh no. A technology exists.

      Let me rephrase that. I cannot comprehend your objections.

  3. It's a general purpose vs dedicated thing by Anonymous Coward · · Score: 2, Interesting

    The CPU ISA isn't the important aspect. Reduced power consumption mostly stems from not needing a high end CPU because the expensive tasks are handled by dedicated hardware. What counts as top of the line ARM hardware can barely touch the processing power of a desktop CPU, but it doesn't need to be faster because all the bulk processing is handled by graphics cores and DSPs. Intel has for a long time tried to stave off the barrage of special purpose hardware. The attempts to make use of ever more general purpose CPU power sometimes bordered on sad clown territory (Remember Intel's attempt to make raytracing games look like something worth pursuing? Guess why: Raytracing is notoriously difficult to implement on graphics hardware due to the almost random data accesses.)

  4. not now, but it certainly did in the past. by nimbius · · Score: 2

    as a greybeard I remember when choosing Intel over Sun meant the project wasnt completed on time, and your electrical/mechanical engineering group lived in the breakroom while their jobs chugged along. Intel was a toy train compared to the power you'd get with RISC. however I can somewhat confidently say the RISC CISC battle is moot these days because x86 has largely caught up to power, sparc, and others. a competent argument could be made however that if it werent for AMD, most servers would probably still be running some flavour of RISC. The foolhardy nature of SUN and SGI can also be argued as a cause of their demise, but ill not flame. Intel wouldn't have bothered to get off their duff without a poke in the ribs from AMD; they had partnerships with RISC manufacturers anyhow and their own RISC-ish processor called itanium. outside of performance though there is another reason people stick with Power and others just as they have in the past. Lock-in.

    you see, applications like Oracle Business Objects and JD Edwards come with a quid-pro-quo of exacting standards to which most businesses must adhere. Namely, IBM or Sun/Oracle hardware. You may only need accounting and payroll, but you'll have to clear a corner of the room for the circus to set up their hardware and make sure everything is "just so." Their hope is that their quiet mandate becomes your quiet mandate, and before you know it other systems that interact with JDE are now required to be Power-based because "thats what runs JDE." The only way out of this is to realize that any business that doesnt explicitly do payroll or metrics for profit, doesnt need the kind of horsepower decreed by things like SAP.

    --
    Good people go to bed earlier.
  5. Re:so why is intel's 14nm haswell still at 3.5 wat by timeOday · · Score: 4, Insightful
    Here is your answer, the A20 is freakishly slow compared to anything Intel would put their name on.

    Granted, you can build a tablet to do specific tasks (like decoding video codecs) around a really slow processor and some special-purpose DSPs. But perhaps the companies in that business aren't making enough profit to interest Intel.

  6. efficiency matters by bzipitidoo · · Score: 2

    This study looks seriously flawed. They just throw up their hands at doing a direct comparison of architectures when they try to use extremely complicated systems and sort of do their best to beat down and control all the factors that introduces. One of the basic principles of a scientific study is that independent variables are controlled. It's very hard to say how much the instruction set architecture matters when you can't tell what pipelining, out of order execution, branch prediction, speculative execution, caching, shadowing (of registers), and so on are doing to speed things up. An external factor that could influence the outcome is temperature. Maybe one computer was in a hotter corner of the test lab than the other, and had to spend extra power just overcoming the higher resistance that higher temperatures cause.

    It might have been better to approach this from an angle of simulation. Simulate a more idealized computer system, one without so many factors to control.

    --
    Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
  7. The article is bad - mfg technology dominates by hkultala · · Score: 2

    They are seriously comparing some 90nm process with much better intel 32nm and 45 nm processes.

    They have just taken some random cores made on random (and uncomparable) manufacturing technologies, throw couple of benchmarks and try to declare universal results based on these.

    Few facts about the benchmarks setup and the cores cores:

    1) They use ancient version of GCC. ARM suffers this much more than x86.
    2) Bobcat is relatively balanced core, no bad bottlenecks. mfg tech is cheap, not high performance but relatively small/new.
    3) Cortex A8 and A9 are really starved by bad cache design. Newer A7 and A12 would be similar in area and powet consumption but much better in performance and performance/power. There are also manufactured on old cheap mfg processes, which hurt them. Use modern manufacturing tech and results are quite much better
    4) Their loonson is made on ANCIENT technology. With modern mfg tech it would be many times better on performance/power.
    5) The cortex A15, even though made on 32nm process, is cheap process, not much better than intel's 45nm process and much worse than intel's 32nm. Also it's known to be a "power hog"-design. Qualcomm's Krait has similar performance level, but with much lower power.

  8. Re: so why is intel's 14nm haswell still at 3.5 wa by the_humeister · · Score: 2

    No relation to energy used. It's in the article: Haswell will get it's work done faster and use about the same energy as the slower chips that take longer. What matters is architecture, not ISA (Atom is lower power than Haswell at the same process node).

  9. CISC - reduced memory access ... by perpenso · · Score: 3, Interesting

    x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.

    Actually it is. Modern performance tuning has a lot to do with cache misses and such. CISC can allow for more instructions per cache hit. The strategy of a hybrid type design, CISC external architecture and RISC internal architecture definitely has some advantages.

    That said, the point of RISC was not solely execution speed. It was also simplicity of design. A simplicity that allowed organization with less money and resources than Intel to design very capable CPUs.

  10. This is a myth that is not true by hkultala · · Score: 5, Informative

    That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.

    This same myth keeps being repeated by people who don't really understand the details on how processors internally work.

    You cannot just change the decoder, the instruction set affect the internals a lot:

    1) Condition handling is totally different on different instruciton sets. This affect the banckend a lot. X86 has flags registers, many other architectures have predicate registers, some predicate registers with different conditions.

    2) There are totally different number of general purpose and floating point registers. The register renamer makes this a smaller difference, but then there is the fact that most RISC's use same registers for both FPU and integer, X86 has separate registers for both. And this totally separates them, the internal buses between the register files and function units in the processor are done very differently.

    3) Memory addressing modes are very different. X86 still does relatively complex address calculations on single micro-operation, so it has more complex address calculation units.

    4) Whether there are operations with more than 2 inputs, or more than 1 output has quite big impact on what kind of internal buses are needed, how many register read and write ports are needed.

    5) There are a LOT of more complex instructions in X86 ISA which are not split into micro-ops but handled via microcode. the microcode interpreter is totally missing on pure RISCs ( but exists on some not-so pure RISC's like Powe/PowerPC).

    6) Instruction set dictates the memory aligment rules. Architectures with more strict alignment rules can have simples load-store-units.

    7) Instruction set dictatetes the multicore memory ordering rules. This may affect the load-store units, caches and buses.

    8) Some instructions have different bitnesses in different architectures. For example x86 has N x X -> 2N wide multiply operations which most RISC's don't have. So x86 needs bigger/different multiplier than most RISCs.

    9) X87 FPU values are 80-bit wide(truncated to 64-bit when storing/loading). Practically all the other CPU's have maximum of 64-bit wide FPU values (though some versions Power have support for 128-bit FP numbers also)

    1. Re:This is a myth that is not true by hkultala · · Score: 4, Informative

      Some of what you said is legitimate. Most of it is irrelevant, since it does not speak to the postulate. You're speaking of issues which will affect performance. So what? You'd have a less-performant processor in some cases, and it would be faster in others.

      No.

      1) if the codition codes work totally differently, they don't work.

      2) The data paths needed for separate and compined FP and integer regs are so different that it makes absolutely NO sense to have them together in chip that runs x86 ISA, even though it's possible.

      3) If you don't have those x86-compatible address calculation units, you have to break most of memory ops into more micro-ops OR even run them with microcode. Both are slow. And if you have a RISC chip you want to have only the address calculation units you need for your simple base+offset addressing.

      4) In the basic RISC pipeline there are two operands, one output/instruction. There are no data paths for two results, you cannot execute operations with multiple outputs such as x86 muliply which produces 2 values(low and high part of result), unless you do something VERY SLOW.

      6) IF your RISC instruction set says you have aligned memory operations, you design your LSU to have only those, as it makes the LSU's much smaller, simpler and faster. But you need unaligned accesses for x86.

      9) If your FPU calculates with different bit width, it calculates wrongly.

      And

  11. Re:I think it's safe to say... by Type44Q · · Score: 2

    That's easy: maintain compatbility with fucktons of legacy code; arguably more of which exists for x86 than every other architecture combined...

  12. Microcode switching by DrYak · · Score: 2

    This same myth keeps being repeated by people who don't really understand the details on how processors internally work.

    Actually, YOU are wrong.

    You cannot just change the decoder, the instruction set affect the internals a lot:

    All the reason you list could all be "fixed in software". The fact that silicon designed by Intel handles opcode in a way a little bit better optimized toward being fed from a x86-compatible frontend is just specific optimisation. Simply doing the same stuff with another RISCy back-end, i.e: interpreting the same ISA fed to the front-end, will simply require each x86 ISA being executed as a different set of micro-instructions. (some that are handled as single ALU opcode on Intel's silicon might require a few more instruction, but that's about the different).

    You could switch the frontend and speak a completely different instruction set. Simply if the two ISA are radically different, the result wouldn't be as efficient as a chip designed with that ISA in mind. (You would need a much bigger and less efficient microcode, because of all the reasons you list. They won't STOP intel from making a chip that speaks something else. Intel will simply produce a chip where the front-end is much more clunky, inefficient, waste 3x more opcode per instruction, and waste much time waiting that some bus gets free or copying values around, etc.).

      And to go back to the parent...

    You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names.

    Not only is this possible, but this was INDEED done.

    There was an entire company called "Transmeta" whose business was centered around exactly that:
    Their chip, the "Crusoe" was compatible with x86.
    - But their chip was actually a VLIW chips, with the front-end being 100% pure software. Absolutely as remote from a pure x86 core as possible.
    - The frontend was entirely 100% pure software.

    The advantage touted by Transmeta was that, although their chip was a bit slower and less efficient, it consumed a tiny fraction of the power and was field-upgradeable (in theory just issue a firmware upgrade to support newer instruction.) Transmeta had demos of Crusoe playing back MPEG video on a few watts, whereas Pentium 3 (the then lower-power Intel chip) would consume way much more.

    Saddly, it all happened in an era where pure raw performance was the king, and where use a small nuclear plant to power an Pentium IV (the then high performance flagship) and needing a small lake nearby for cooling was considered perfectly acceptable. So Crusoe didn't see that much success.

    Still, Crusoe was successfully used as a test bed for a few experimental CPU to test their ISA before actual test-bed where available. (If I remember correctly, Crusoe where used to test running x86_64 code before actual Athlon 64 where available for developers), and there were a few experimental proof-of-concept running PowerPC ISA.

    In a way modern way, this isn't that much dissimilar from how Radeon handle compiled shared, except that the front-end is now a piece of software which run inside OpenGL on the main CPU: intermediate instruction a compiled to either VLIW or CGN opcode which are 2 entirely different back-ends.
    (Except that, due to the highly repetitive nature of a shared, instead of decoding instruction on the fly as they come, you optimise it once into opcode, store it into a cache and you're good).

    Again, on a similar way ARM can switch between 2 different types of instruction set (normal and thumb mode), 2 different sets, one back-end.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Microcode switching by hkultala · · Score: 2

      This same myth keeps being repeated by people who don't really understand the details on how processors internally work.

      Actually, YOU are wrong.

      You cannot just change the decoder, the instruction set affect the internals a lot:

      All the reason you list could all be "fixed in software".

      No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.

      The fact that silicon designed by Intel handles opcode in a way a little bit better optimized toward being fed from a x86-compatible frontend is just specific optimisation.

      Opcodes are irrelevant. They are easy to translate. What matters are the differences in the semantics of the instructions.
      X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
      This is semantics of instructions, and they differ between ISA's.

      Simply doing the same stuff with another RISCy back-end, i.e: interpreting the same ISA fed to the front-end, will simply require each x86 ISA being executed as a different set of micro-instructions. (some that are handled as single ALU opcode on Intel's silicon might require a few more instruction, but that's about the different).

      The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.

      You could switch the frontend and speak a completely different instruction set. Simply if the two ISA are radically different, the result wouldn't be as efficient as a chip designed with that ISA in mind. (You would need a much bigger and less efficient microcode, because of all the reasons you list. They won't STOP intel from making a chip that speaks something else.

      Intel did this, they added x86 decoder to their first itanium chips. And. They did not only add the frontend, they added some small pieces to their backend so that it could handle those strange x86 semantic cases nicely.
      But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.

      Not only is this possible, but this was INDEED done.

      There was an entire company called "Transmeta" whose business was centered around exactly that:
      Their chip, the "Crusoe" was compatible with x86.
      - But their chip was actually a VLIW chips, with the front-end being 100% pure software. Absolutely as remote from a pure x86 core as possible.'

      The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like:

      * 80-bit FPU,
      * x86-compatible virtual memory page table format(one very important thing I forgot from my original list couple of posts ago; Memory accesses get VERY SLOW if you have to emulate virtual memory)
      * support for partial register writes(to emulate 8- and 16-bit subregisters like al, ah,ax )

      All these were made to make binary translation from x86 easy and reasonable fast.

  13. Original sources by enriquevagu · · Score: 2

    It is really surprising that neither the linked Extremetech article, nor the slashdot summary cite the original source. This research was presented in HPCA'13 in a paper titled "Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures", by Emily Blem et al, from the University of Wisconsin's Vertical Research group, led by Dr. Karu Sankaralingam. You can find the original conference paper in their website.

    The Extremtech article indicates that there are new results with some additional architectures (MIPS Loongson and AMD processors were not included in the original HPCA paper), so I assume that they have published an extended journal version of this work, which is not yet listed in their website. Please add a comment if you have a link to the new work.

    I do not have any relation with them, but I knew the original HPCA work.

  14. Re:Final nail in the Itanium coffin by WuphonsReach · · Score: 2

    All of which paints a bleak picture for Itanium. There is no compelling reason to keep Itanium alive other than existing contractual agreements with HP. SGI was the only other major Itanium holdout, and they basically dumped it long ago. And Itaiums are basically just glorified space heaters in terms of power usage.

    Itanium was dead on arrival.

    It ran existing x86 code much slower. So if you wanted to move up to 64bit (and use Itanium to get there), you had to pay a lot more for your processors, just to run your existing workload.

    Okay, you say, but everyone was supposed to stop running x86 and start running Itanium binaries! Please put down the pipe and come back to reality. No company is going to repurchase all of their software to run on a new platform, just because Intel says this is the way forward.

    Maybe, maybe! If all of the business software was open-source and easily ported to a different CPU architecture it might have worked. But only if you'd gain a 3x-5x improvement in wall clock performance by porting from x86 to Itanium instruction sets. (An advantage that never materialized.)

    And once AMD started shipping AMD64 and Opterons that could run your existing x86 workload, on a 64bit CPU, at slightly fastter speeds then your old kit for the same price - that buried any chance of Itanium ever succeeding in the market. Any forward looking IT person, when it came time to upgrade old kit, chose AMD64 - because while they might be running 32bit OS/progs today, the 64bit train was rumbling down the tracks. So picking a chip that could do both, and do both well, was the best move.

    --
    Wolde you bothe eate your cake, and have your cake?
  15. Again, what's the problem ? by DrYak · · Score: 3, Interesting

    All the reason you list could all be "fixed in software".

    The quotes around the "software" mean that i refer about the firmware/microcode as a piece of software designed to run on top of the actual execution units of a CPU.

    No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.

    Slow: yes, indeed. But not impossible to do.

    What matters are the differences in the semantics of the instructions.
    X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
    This is semantics of instructions, and they differ between ISA's.

    Yeah, I pretty well know that RISCs don't (all) have flags.
    Now, again, how is that preventing the micro-code swap that dinkypoo refers to (and that was actually done on transmeta's crusoe)?
    You'll just end with a bigger clunkier firmware that for a given front-end instruction from the same ISA, will translate into a big bunch of back-end micro-ops.
    Yup. A RISC's ALU won't update flags. But what's preventing the firmware to dispatch *SEVERAL* micro-ops ? first to do the base operation and then aditionnal instructions to update some register emulating flags?
    Yes, it's slower. But, no that don't make micro-code based change of supported ISA impossible, only not as efficient.

    The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.

    Yes, and please explain how that makes *definitely impossible* to run x86 instruction? and not merely *somewhat slower*?

    Intel did this, they added x86 decoder to their first itanium chips. {...} But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.

    Slow, but still doable and done.

    Now, keep in mind that:
    - Itanium is a VLIW processor. That's an entirely different beast, with an entirely different approach to optimisation, and back during Itanium development the logic was "The compiled will handle the optimising". But back then such magical compiler didn't exist and anyway didn't have the necessary information at compile time (some type of optimisation requires information only available at run time. Hence doable in microcode, not in compiler).
    Given the compilers available back then, VLIW sucks for almost anything except highly repeated task. Thus it was a bit popular for cluster nodes running massively parallel algorithms (and at some point in time VLIW were also popular in Radeon GFX cards). But VLIW sucks for pretty much anything else.
    (Remember that, for example, GCC has auto-vectorisaion and well performing Profile-Guided-Optimisation only since recently).
    So "supporting an alternate x86 instruction on Itanium was slow" has as much to do with "supporting an instruction set on a back-end that's not tailored for the front-end is slow" as it has to do with "Itanic sucks for pretty much everything which isn't a highly optimized kernel-function in HPC".

    But still it proves that runing a different ISA on a completely alien back-end is doable.
    The weirdness of the back-end won't prevent it, only slow it down.

    Luckily, by the time Transmeta Crusoe arrived:
    - knowledge had a bit advance in how to handle VLIW ; crusoe had a back-end better tuned to run CISC ISA

    Then by the time Radeon arrived:
    - compilers had gotten even better ; GPU are used for the same (only) class of task at which VLIW excels.

    The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like {...} All these were made to make binary translation from x86 eas

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]