Slashdot Mirror


Research Shows RISC vs. CISC Doesn't Matter

fsterman writes The power advantages brought by the RISC instruction sets used in Power and ARM chips is often pitted against the X86's efficiencies of scale. It's difficult to assess how much the difference between instruction sets matter because teasing out the theoretical efficiency of an ISA from the proficiency of a chip's design team, technical expertise of its manufacturer, and support for architecture-specific optimizations in compilers is nearly impossible . However, new research examining the performance of a variety of ARM, MIPS, and X86 processors gives weight to Intel's conclusion: the benefits of a given ISA to the power envelope of a chip are minute.

161 comments

  1. isn't x86 RISC by now? by alen · · Score: 5, Informative

    i've read the legacy x86 instructions were virtualized in the CPU a long time ago and modern intel processors are effectively RISC that translate to x86 in the CPU

    1. Re:isn't x86 RISC by now? by Z80a · · Score: 3, Interesting

      As far i'm aware since the pentium pro line, the intel CPUs are RISCs with translation layers and AMD been on this boat since the original athlon.

    2. Re:isn't x86 RISC by now? by Anonymous Coward · · Score: 2, Interesting

      x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.

    3. Re:isn't x86 RISC by now? by wiredlogic · · Score: 1

      This. The x86 ISA is roughly analogous to ARM Thumb compressed instructions. It is just a front end to a register rich RISC core.

      --
      I am becoming gerund, destroyer of verbs.
    4. Re:isn't x86 RISC by now? by cheesybagel · · Score: 2

      Actually AMD did that way back in the K5 time. The K5 was a 29k RISC processor with a x86 frontend.

    5. Re:isn't x86 RISC by now? by Z80a · · Score: 1

      Interessing.
      I was thinking the Atlhon that was the 29k.

    6. Re:isn't x86 RISC by now? by bill_mcgonigle · · Score: 2

      I have to assume the wisc.edu folks know this and somebody gummed up the headlines along the way.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    7. Re:isn't x86 RISC by now? by drinkypoo · · Score: 4, Interesting

      That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    8. Re:isn't x86 RISC by now? by Anonymous Coward · · Score: 4, Insightful

      Yes. As noted by the study (That by the way isn't very good.) "When every transistor counts, then every instruction, clock cycle, memory access, and cache level must be carefully budgeted, and the simple design tenets of RISC become advantageous once again."
      Essentially meaning that "If you want as few transistors as possible it doesn't help to have the CISC to RISC translation layer in x86"

      They also claim things like "The report notes that in certain, extremely specific cases where die sizes must be 1-2mm2 or power consumption is specced to sub-milliwatt levels, RISC microcontrollers can still have an advantage over their CISC brethren." which clearly indicates that their idea of "embedded" systems is limited to smartphones.
      The cases where you have a battery that can't be recharged on daily basis is hardly an extremely specific case. Not that any CPU they tested is suitable for those applications anyway. They have essentially limited themselves to applications where "not as bad as P4" is acceptable.

    9. Re:isn't x86 RISC by now? by RabidReindeer · · Score: 4, Interesting

      x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.

      They're not the only ones. The IBM mainframes have long been VMs implemented on top of various microcode platforms. In fact, one of the original uses of the 8-inch floppy disk was to hold the VM that would be loaded up during the Initial Microprogram Load (IMPL), before the IPL (boot) of the actual OS. So in a sense, the Project Hercules mainframe emulator is just repeating history.

      Nor were they unusual. In school I worked with a minicomputer which not only was a VM on top of microcode, but you could extend the VM by programming to the microcode yourself.

      The main differences between RISC and CISC, as I recall were lots of registers and the simplicity of the instruction set. Both the Intel and zSeries CISC instruction sets have lots of registers, though. So the main difference between RISC and CISC would be that you could - in theory - optimize "between" the CISC instructions if you coded RISC instead.

      Presumably somebody tried this, but didn't get benefits worth shouting about.

      Incidentally, the CISC instruction set of the more recent IBM z machines includes entire C stdlib functions such as strcpy in a single machine-language instruction.

    10. Re:isn't x86 RISC by now? by cheesybagel · · Score: 5, Informative

      After AMD lost the license to manufacture Intel i486 processors, together with other people, they were forced to design their own chip from the ground up. So they basically used one of the 29k RISC processors and put an x86 frontend on it. Cyrix did more or less the same thing at the time also coming with their own design. Since the K5 had good performance per clock but could not clock very high and was expensive AMD was stuck and to get their next processor they bought a company called NexGen which designed the Nx586 processor which was Intel compatible. AMD then worked on the successor of Nx586 as a single chip which was the K6. The K7 Athlon was yet another design made by a team headed by Dirk Meyer who used to be a chip designer at Digital Equipment Incorporated i.e. DEC. He was one of the designers of the Alpha series of RISC CPUs and the Athlon resembles an Alpha chip internally a lot because of that.

    11. Re: isn't x86 RISC by now? by the_humeister · · Score: 2

      Actually, Nexgen was the first to do x86 -> RISC. Then Intel with Pentium Pro. Then AMD with K5. As far as I recall, Cyrix never did x86 -> RISC back then until they were acquired by VIA (ie, Cyrix M series chips executed x86 directly, but VIA Epia and later translate).

    12. Re: isn't x86 RISC by now? by cheesybagel · · Score: 1

      Right NexGen released their chip first but it is hard to compare because it was a dual-chip solution where the FPU came in a separate package while the K5 had its own FPU. NexGen only did a single-chip product later. K5 development process was highly protracted and difficult and the release was delayed many times. The Cyrix 5x86 was also similar in a lot of regards to the Pentium Pro. In fact I remember the Pentium Pro designer himself stating that they had a lot of interesting insights during Pentium Pro chip design after doing low-level reverse engineering of a Cyrix processor.

    13. Re:isn't x86 RISC by now? by hkultala · · Score: 1

      After AMD lost the license to manufacture Intel i486 processors, together with other people, they were forced to design their own chip from the ground up. So they basically used one of the 29k RISC processors and put an x86 frontend on it.

      This was their plan, but it ended up being quite much harder than they originally thought, and K5 came out much later, much different and much slower than planned. There are quite a lot of thigns that have to be done differently (some of them are explained in my another post)

    14. Re:isn't x86 RISC by now? by funwithBSD · · Score: 2

      When the DEC Alpha was killed, many of the engineers were picked up by AMD.

      --
      Never answer an anonymous letter. - Yogi Berra
    15. Re:isn't x86 RISC by now? by enriquevagu · · Score: 4, Insightful

      This is why we use the terms "Instruction Set Architecture" to define the interface to the (assembler) programmer, and "microarchitecture" to refer to the actual internal implementation. ISA is not bullshit, unless you confuse it with the internal microarchitecture.

    16. Re:isn't x86 RISC by now? by morgauxo · · Score: 1

      Does that mean that the Transmeta Crusoe wasn't anything special?

    17. Re:isn't x86 RISC by now? by drinkypoo · · Score: 0

      This is why we use the terms "Instruction Set Architecture" to define the interface to the (assembler) programmer,

      No, no we do not. That is called the instruction set. The programmer does not use the instruction set architecture, they simply issue instructions which the processor then executes as it sees fit, especially in [OoO] architectures with branch prediction. The architecture is the silicon, and the programmer isn't sitting on the die flipping switches.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    18. Re: isn't x86 RISC by now? by Anonymous Coward · · Score: 0

      FWIW, I don't think the Cyrix 5x86 was very similar at all to the Pentium Pro.

      Although the Cyrix 5x86 and Pentium Pro both had a scheme which decoded x86 to micro-ops, the similarities pretty much end there. As I recall the 5x86 had a very basic micro-op fusion scheme (address gen) and was out-of-order only for loads/stores, where the Pentium Pro was a much more advanced (for it's time) super-scalar out-of-order machine.

      As one of the 5x86 architects put it, they used superscalar architectural features in a scalar configuration...

      Of course P6 pipeline was something like 14 stages where the 5x86 was closer to 9 which means the broken branch prediction on the 5x86 didn't matter as much...

    19. Re:isn't x86 RISC by now? by bws111 · · Score: 2

      The very first paragraph of IBMs z/Architecture Principles of Operation:

      The architecture of a system defines its attributes as seen by the programmer, that is, the conceptual structure and functional behavior of the machine, as distinct from the organization of the data flow, the logical design, the physical design, and the performance of any particular implementation. Several dissimilar machine implementations may conform to a single architecture. When the execution of a set of programs on different machine implementations produces the results that are defined by a single architecture, the implementations are considered to be compatible for those programs.

    20. Re:isn't x86 RISC by now? by Guy+Harris · · Score: 3, Informative

      They're not the only ones. The IBM mainframes have long been VMs implemented on top of various microcode platforms.

      But the microcode implemented part or all of an interpreter for the machine code; the instructions weren't translated into directly-executed microcode. (And the System/360 Model 75 did it all in hardware, with no microcode).

      And the "instruction set" for the microcode was often rather close to the hardware, with extremely little in the way of "instruction decoding" of microinstructions, although I think some lower-end machines might have had microinstructions that didn't look too different from a regular instruction set. (Some might have been IBM 801s.)

      So that's not exactly the same thing as what the Pentium Pro and successors, the Nx586, and the AMD K5 and successors, do.

      Currently mainframe processors, however, as far as I know 1) execute most instructions directly in hardware, 2) do so by translating them into micro-ops the same way current x86 processors do, and 3) trap some instructions to "millicode", which is z/Architecture machine code with some processor-dependent special instructions and access to processor-dependent special registers (and, yes, I can hear the word PALcode being shouted in the background...). See, for example, " A high-frequency custom CMOS S/390 microprocessor" (paywalled, but the abstract is free at that link, and mentions millicode) and "IBM zEnterprise 196 microprocessor and cache subsystem" (non-paywalled copy; mentions microoperations). I'm not sure those processors have any of what would normally be thought of as "microcode".

      The midrange System/38 and older ("CISC") AS/400 machines also had an S/360-ish instruction set implemented in microcode. The compilers, however, generated code for an extremely CISCy processor - but that code wasn't interpreted, it was translated into the native instruction set by low-level OS code and executed.

      For legal reasons, the people who wrote the low-level OS code (compiled into the native instruction set) worked for a hardware manager and wrote what was called "vertical microcode" (the microcode that implemented the native instruction set was called "horizontal microcode"). That way, IBM wouldn't have to provide that code to competitors, the way they had to make the IBM mainframe OSes available to plug-compatible manufacturers, as it's not software, it's internal microcode. See "Inside the AS/400" by one of the architects of S/38 and AS/400.

      Current ("RISC") AS/400s^WeServer iSeries^W^WSystem i^WIBM Power Systems running IBM i are similar, but the internal machine language is PowerPC^WPower ISA (with some extensions such as tag bits and decimal-arithmetic assists, present, I think, in recent POWER microprocessors but not documented) rather than the old "IMPI" 360-ish instruction set.

      The main differences between RISC and CISC, as I recall were lots of registers and the simplicity of the instruction set. Both the Intel and zSeries CISC instruction sets have lots of registers, though.

      Depends on which version of the instruction set and your definition of "lots".

      32-bit x86 had 8 registers (many x86 processors used register renaming, but they still had only 8 programmer-visible registers, and not all were as general as one might like), and they only went to 16 registers in x86-64. System/360 had 16 general-purpose registers (much more regular than x86, but that's not setting the bar all that high :-)), and that continues to z/Architecture, althoug

    21. Re: isn't x86 RISC by now? by LostMyBeaver · · Score: 2

      You make a lot of good points. But any time you create a processor with cache miss prediction as well as branch prediction as well as execution parallelization, the instruction set has almost no effect at all.

      You're suggesting that the instruction set interpretation is a separate unit to to out of order execution code. In reality, the first stage of any highly optimized modern CPU able to minimize pipeline misses must actually "recompile the code" (an oversimplification) in hardware before the actual execution unit and any intelligent designer would simplify the operations into much wider RISC style operations in the pipeline(s).

      A purely in-order processor would certainly suffer from using a variable length instruction set.

      What many people fail to realize is that ARM is very much a CISC ISA as well, but with a wider fixed sized word. So instead of increment ing instruction lengths by single byte units, ARM has two byte wide units in Thumb mode and four byte wife increments in ARM mode. Either way, it's nothing like MIPS (which is a dinosaur) which is a pure RISC architecture.

      With the exception of a rare set of ULIW DSP cores (TI), it doesn't really matter. The major performance booster DSPs have is that code for a DSP are generally very small programs with extremely long pre-optimized pipelines which are not intended to be run in a task switching environment. Those processors don't need translation because they're completely pre-optimized. They also don't tend to have memory fragmentation and almost never have MMUs.

      General purpose CPUs run code all ass-backwards because the code is unpredictable. The ISA will almost always benefit most from having as many possible version of instructions as possible for performance per watt.

      The trick will be how to detect when specific units aren't needed and power them down when possible. Compilers and ISA will have no impact on this... Unless instructions are added to power on and off units explicitly based on the code entering the pipeline.

    22. Re:isn't x86 RISC by now? by Anonymous Coward · · Score: 0

      RISC processors had hundreds of registers to store the stack frames. There is some smart overlapping of stack frames so that functions could pass by reference straight through registers. When you look at the depth of the function call stacks in some GUI systems, those are needed. Then there would be separate addition/multiplication and vector operation units, so that separate instructions could be processed independently. Modern CPU's are also deeply pipelined to around 14+ stages - every one of the classic four Fetch, Read, Execute, Write stages has been parallelized with pre-lookup, abort and bypass stages, so that means that around 100+ instructions can be in flight at any time. Then the results written back into registers have to be synchronized. So there are register scoreboards to keep track of dependencies. To keep track of all those and to guarantee program execution safety, extra instructions have been added to implement mutex's and thread barriers in hardware. Another difference was that RISC CPU's would implement complex instructions like floating-point division in microcode rather than in hardware logic.

      Even back in the 1990's, Intel CPU's had the REP flag in their instructions which said to repeat the specific instruction a certain number of times. So you could do things like string clears in this way.

    23. Re:isn't x86 RISC by now? by Darinbob · · Score: 1

      RISC was not supposed to be a religion, which is what it seems to have turned into. No one should even ben arguing the point today because modern chips are so different from when the term 'RISC' was new. The whole premise behind RISC is being used extensively in modern CISC machines. The problem with trying to keep a RISC vs CISC debate alive is that is harms the education of the students.

      RISC is primarily at its core about eliminated complex infrastructure where you can and reusing the resources for things that really can improve performance. Remember thay when RISC was first being used that the primary CISC machine at the time was the VAX (x86 still being a toy). The VAX operated primarily with a micro-architecture with the instructions implemeted in microcode. It had some amazingly complicated instructions, including one for helping compute polynomials. RISC researchers wanted to reuse those transistors for other purposes: more registers, more cache, pipelining, etc.

      In fact, RISC was already well underway before 'RISC' was coined as a term. Many of the super computers and super-minis of the day already used similar techniques. One of the tradeoffs to overcome was the ease of programming in assembler versus performance, another tradeoff was memory space versus performance. Much of the complexity in CISC machines was also to support the high level architecture: paging systems, memory protection, IO subsystems, which is one major reason why the VAX was designed the way it was. Compare to the microship CPUs at the time; RAM was expensive so they also were primarily CISC in order to squeeze the most out of each instruction.

      At the time the machinery necessary to support the instruction decoding was a significant part of the typical CPU design, so shrinking that gave you a big win. Today this is no longer true, the decoding and micro-architecture on Intel chips is essentially a trivial fraction, they even have on-chip caches with more static RAM than an early 80s era CISC decoder.

      However it may not be true everywhere. We still have small embedded and low power chips where this stuff does matter. ARM7TDMI is a popular chip which relies on RISC in order to keep everything simple and small, no space for caches or instruction queues, so having a simple instruction decoder is a big win in keeping it simple and low power. Most of the small low power chips today for embedded use are clearly RISC derived for that reason, PIC, AVR, etc. The article here is talking about the big beefy desktop systems, which are just a fraction of CPUs being used.

    24. Re:isn't x86 RISC by now? by Guy+Harris · · Score: 1

      RISC processors had hundreds of registers to store the stack frames. There is some smart overlapping of stack frames so that functions could pass by reference straight through registers. When you look at the depth of the function call stacks in some GUI systems, those are needed.

      RISC processors with the letters "S", "P", "A", "R", and "C" in the instruction set name, in that order, did. The ones with the digits "8", "0", "9", "6", and "0" in the processor name also did, I think. The ones with "M", "I", "P", and "S" in the instruction set name, in that order, did not, nor did the ones with "A", "l", "p", "h", and "a" in the instruction set name, in that order, nor the ones with "A", "R", and "M" in the instruction set name, in that order, nor the ones with instruction sets having names matching the regular expression "P(ower|OWER)(PC| ISA)".

      And, given that most processors running GUI systems these days, and even most processors running GUI systems before x86/ARM ended up running most of the UI code people see, didn't have register windows, no, they're not needed. Yeah, SPARC workstations may have been popular, but I don't think register windows magically made GUIs work better on them. (And remember that register windows eventually spill, so once the stack depth gets beyond a certain point, I'm not sure they help; it's shallow call stacks, in which you can go up and down the call stack without spilling register windows, where they might help.)

      Then there would be separate addition/multiplication and vector operation units, so that separate instructions could be processed independently. Modern CPU's are also deeply pipelined to around 14+ stages - every one of the classic four Fetch, Read, Execute, Write stages has been parallelized with pre-lookup, abort and bypass stages, so that means that around 100+ instructions can be in flight at any time. Then the results written back into registers have to be synchronized. So there are register scoreboards to keep track of dependencies. To keep track of all those and to guarantee program execution safety, extra instructions have been added to implement mutex's and thread barriers in hardware.

      None of which has anything to do with RISC vs. CISC, and much of which wasn't the case when RISC processors first came out.

      Another difference was that RISC CPU's would implement complex instructions like floating-point division in microcode rather than in hardware logic.

      Actually, it's more likely to have been the other way around, unless by "microcode" you meant "software". These days, most processors whether RISC or CISC, probably do floating-point division in hardware.

    25. Re: isn't x86 RISC by now? by Darinbob · · Score: 2

      ARM is RISC through and through. Though it complicates it somewhat with multiple simple instruction sets. Basic ARM ISA is all 32-bit instructions only, very much RISC from every angle you look at it, every bit as pure as MIPS. ARM Thumb ISA is 16-bit instructions only, and the machine translation from Thumb to ARM is very simple, just a fraction of the chip. Thumb2 gets slightly more complex allowing both 16 and 32 bit instructions intermixed, but again it's not that complicated. It's just RISC like all around. Some exceptions with the PC-relative indexing perhaps, but all instructions are orthagonal, you can decode the machine code by hand, all instructions take the same amount of time to execute (not true on some multiplies/divides on some models, which was also true for classic RISC), decoding is a tiny fraction of the chip being used.

      MIPS isn't a dinosaur, they still develop and advance it and it's still being sold new for use in new products.

    26. Re:isn't x86 RISC by now? by Darinbob · · Score: 1

      I dont' see it. The ARM Thumb instruction set is vastly more simple and regular than even the 286 instruction set. Thumb is already a reduced instruction set. There are no special purpose string instructions, it has general purpose registers that can be used as anything whereas 286 has only special purpose registers (AX is only accumulator, BX is the only base register, CX is the only register for counting instructions, etc). Yes, SP and PC are special purpose, but that's true of all the early RISC machines too. I don't see anything in Thumb I would call CISC-like.

    27. Re:isn't x86 RISC by now? by Anonymous Coward · · Score: 0

      Sorry, you obviously do not know anything about computer architecture.
      Read Hennessy and Patterson's books, or any other serious teaching material.
      Instruction Set Architectures do exist, the internal microcode and/or pipeline states are not comparable to instruction sets (the whole "there is a RISC inside" idea is bullshit), the operating units (ALU, branch, FPU...) are tailored for the target ISA as is the MMU.
      The idea that changing from a x86 to a PowerPC or an ARM is doable just by changing the decoder is laughable.

      Inside a x86, there are pieces of x86.

      (Yes, I have already designed a CPU, a 32bits RISC)

    28. Re:isn't x86 RISC by now? by Anonymous Coward · · Score: 0

      Truly DOUBLE (this link's a compliment to you both) & yours was a GREAT read (double the length of the one I gave the compliment too, but yours was DOUBLY as informative)...

      * You ought to be rated higher, but alas, with my posting as an "AC", I have no modpoints to give you... sorry!

      APK

      P.S.=> Per my subject-line -> http://hardware.slashdot.org/c...

      ... apk

    29. Re:isn't x86 RISC by now? by lars_stefan_axelsson · · Score: 1

      And, given that most processors running GUI systems these days, and even most processors running GUI systems before x86/ARM ended up running most of the UI code people see, didn't have register windows, no, they're not needed. Yeah, SPARC workstations may have been popular, but I don't think register windows magically made GUIs work better on them. (And remember that register windows eventually spill, so once the stack depth gets beyond a certain point, I'm not sure they help; it's shallow call stacks, in which you can go up and down the call stack without spilling register windows, where they might help.)

      I remember reading research back in the day, that showed that register windows were orthogonal to any RISC/CISC considerations, i.e. they were about as easy/costly to implement in either architecture and they gave the same boost/or not, in either case. As you point out, in practice it turned out to not be really worth the trouble, and they died out rather quickly.

      --
      Stefan Axelsson
    30. Re:isn't x86 RISC by now? by lars_stefan_axelsson · · Score: 1

      CISC ISAs may have individual "complex" instructions, such as procedure call instructions, string manipulation instructions, decimal arithmetic instructions, and various instructions and instruction set features to "close the semantic gap" between high-level languages and machine code, add extra forms of data protection, etc. - although the original procedure-call instructions in S/360 were pretty simple, BAL/BALR just putting the PC of the next instruction into a register and jumping to the target instruction, just as most RISC procedure-call instructions do. A lot of the really CISCy instruction sets may have been reactions to systems like S/360, viewing its instruction set as being far from CISCy enough, but that trend has largely died out.

      I know you say "current", but one of the original ideas behind RISC was also to make each instruction "short", i.e. make each instruction take one cycle, and reduce cycle times as much as possible so that you could have really deep pipelines (MIPS), or increase clock speed. Now, while most "RISCs" today, sort-of follow this idea, by virtue of the ISA having been made with that in mind in the old days, i.e. load-store etc. they're typically not as strict about it (if they in fact ever where). I guess the CISC situation is even more complicated, as they're "internally" RISC, and you can kind-a-sort-a treat them that way by staying away from the "heavy" instructions. That is if you can reason about what kind of time you're going to see from your micro-opped+out-of-order core anyway. The internals, and specifically the timing models have gotten even more complex than they already were. I don't know what your take on that would be?

      --
      Stefan Axelsson
    31. Re: isn't x86 RISC by now? by WilCompute · · Score: 1

      However, there are projects underway to optimize general purpose instructions, i.e. the Mill CPU designs found at millcomputing.com. They are designing a cpu from the ground up that can process up to 33 instructions per cycle, trying to get the performance per watt of a DSP with the flexibility of a General Purpose CPU.

      --
      NDxTreme Content on the Edge.
    32. Re: isn't x86 RISC by now? by unixisc · · Score: 1

      Nonetheless, they've lost some major markets - game consoles first to PPC, and later Intel, and then, the embedded market to ARM

    33. Re:isn't x86 RISC by now? by unixisc · · Score: 1

      The AMD-64 architecture - is that also register limited? Or did AMD toss something like 32-64 program accessible registers @ the problem? And if they did, would Intel have limited theirs?

    34. Re:isn't x86 RISC by now? by Guy+Harris · · Score: 1

      The AMD-64 architecture - is that also register limited?

      With 16 GPRs, it has fewer registers than all the major RISC architectures other than 32-bit ARM, just as the 32-GPR System/3x0 (including its 64-bit z/Architecture version) does. It's less register-limited than x86, but that's not setting the bar very high. (Note that IBM recently added instructions to z/Architecture that do arithmetic on the upper 32 bits of the GPRs; that suggests that there's some register pressure with only 16 GPRs, although if they still have to make use of base registers, even with PC-relative branches, that might add some additional pressure that x86-64 doesn't have.)

      Or did AMD toss something like 32-64 program accessible registers @ the problem?

      No, they didn't; x86-64 has, as noted, only 16.

      And if they did, would Intel have limited theirs?

      Limited their what?

    35. Re:isn't x86 RISC by now? by niftymitch · · Score: 1

      i've read the legacy x86 instructions were virtualized in the CPU a long time ago and modern intel processors are effectively RISC that translate to x86 in the CPU

      Well the folk at Transmeta Corporation made it obvious that the external
      ISA was no longer a necessary constraint on the way a modern processor
      works. The explosion of the fast transistor count made it possible to craft
      an instruction issue logic chain that was very rich in the clock times of modern
      days.

      Strictly modern processors are more VLIW than RISC and trigger arrays
      of resources selected by the expanded long instruction words.

      --
      Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
  2. It's a question that WAS relevant by i+kan+reed · · Score: 3, Insightful

    Back when compilers weren't crazy optimized to their target instruction set, people coding things in assembler wanted CISC, and people using compilers wanted RISC.

    But nowadays almost no one still does the former, and the latter uses CISC chips a lot better.

    This is now a question for comp sci history, not engineers.

    1. Re:It's a question that WAS relevant by TWX · · Score: 5, Funny

      You mean, my Github project on Ruby on Rails with node.js plugins isn't optimized to use one versus the other?

      --
      Do not look into laser with remaining eye.
    2. Re:It's a question that WAS relevant by Rich0 · · Score: 1

      I actually wonder how relevant CISC even is to people doing assembly programming these days. There is no reason you can't pull an LLVM-like move and target something other than the native instruction set when you are programming.

      That is basically all the x86 does anyway - convert CISC instructions into microcode. There is no reason that an assembler couldn't do the same thing, further blurring the lines between assembly and compiled code. If the whole point of writing assembly is to optimize your code, and a RISC processor could run your code faster after low-level-compilation than a CISC processor could run it natively, then RISC is what you really want anyway.

    3. Re:It's a question that WAS relevant by i+kan+reed · · Score: 1

      Usually, if you're coding in assembly, it's because you're trying to bring some very basic core functionality into a system, be it an OS component, a driver, or a compiler, and usually that means that you're engaged in enough system-specific behaviors that virtualization does you no good.

      Java and .NET benefit from a virtual machine language precisely because they're high level languages, and it's easier to compile assembly to assembly in multiple ways than to compile high level languages to multiple assembly languages.

    4. Re:It's a question that WAS relevant by ebno-10db · · Score: 1

      easier to compile assembly to assembly in multiple ways than to compile high level languages to multiple assembly languages.

      In other words, they don't want to be bothered writing real compilers.

    5. Re:It's a question that WAS relevant by Nyall · · Score: 3, Interesting

      I think a large part of the confusion is that CISC often means accumulator architectures (x86, z80, etc) vs RISC which means general purpose register (ppc, sparc, arm, etc) In between you have variable width RISC like thumb2.

      As an occasional assembly programmer (PowerPC currently) I far prefer these RISC instructions. With x86 (12+ years ago) I would spend far more instructions juggling values into the appropriate registers, then doing the math, then juggling the results out so that more math could be done. With RISC, especially with 32 GPRs, that juggling is near eliminated to the prologue/epilogue. I hear x86 kept taking on more instructions and that AMD64 made it a more GPR like environment.

      -Samuel

      --
      http://en.wikipedia.org/wiki/Jury_nullification
    6. Re:It's a question that WAS relevant by mlts · · Score: 1

      With Moore's law flattening out, the pendulum might end up swinging back that way.

      Right now, for a lot of tasks, we have CPU to burn, so the ISA doesn't really matter as much as it did during the 680x0 era.

      But who knows... Rock's law may put the kibosh on Moore's law eventually, so we might end up seeing speed improvements ending up being either better cooling (so clock speeds can be cranked up), or adding more and more special purpose cores [1]. At this point, it might be that having code optimized by a compiler for a certain ISA may be the way of developing again.

      [1]: High-power CPUs, low-energy CPUs, GPUs, FPUs, FPGAs, and even going from there, CPUs intended for I/O (MIPS.) It might be that we might have a custom core just to run the OS's kernel, another to run security sensitive code, and still others for applications.

    7. Re:It's a question that WAS relevant by TangoMargarine · · Score: 1

      virtual machine language

      Don't mind me; I'm just twitching over here and fighting down the urge to vomit.

      --
      Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
    8. Re:It's a question that WAS relevant by mlts · · Score: 2

      Even though Itanium is all but dead, I did like the fact that you had 128 GP registers to play with. One could do all the loads in one pass, do the calculations, then toss the results back into RAM. The amd64 architecture is a step in the right direction, and I'd say that even though it was considered a stopgap measure at the time, it seems to have been well thought out.

    9. Re:It's a question that WAS relevant by shoor · · Score: 1

      Back in the 1970s I worked at a computer manufacturer, writing code for their product's instruction set in assembler. The computers were designed and built around AMD2901 bit slices. The hardware guys implemented the instruction sets using microcode and, as the computers got bigger and more complicated some of the instructions got so elaborate that programmers found ways to do an operation faster using a few simpler instructions instead of one complicated one.

      Nowadays, with the kind of speedups from using cache memory, branch prediction, and so on, I reckon it could be a whole different ballgame. I suspect though, that proving correctness might become the most important criteria, and simpler would make proving correctness easier.

      --
      In theory, theory and practice are the same; in practice they're different. (Yogi Berra & A. Einstein)
    10. Re:It's a question that WAS relevant by i+kan+reed · · Score: 2

      Oh no. A technology exists.

      Let me rephrase that. I cannot comprehend your objections.

    11. Re:It's a question that WAS relevant by Zero__Kelvin · · Score: 1

      No. In other words, assembly language programmers don't use compilers; they use assemblers.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    12. Re:It's a question that WAS relevant by Eravnrekaree · · Score: 1

      An all the way CISC architecture would allow you to have both operands of an instruction to be pointers to memory, the CPU would have the circuitry to load values into a register in the background. That would eliminate manual register loading. You would also have 3 operand versions of the instructions, as well. x86 is not the most CISC architecture out there or take the concept as far as it could. This would be a very programmer friendly environment. AMD expanded the number of registers, they could add even more, I dont know what the reasons are for not having at least 32 GPRs.

    13. Re:It's a question that WAS relevant by TangoMargarine · · Score: 1

      "Virtual" anything means there's at least one layer of abstraction between the thing and anything the layperson would consider remotely close to the hardware. "Machine" would imply something that is inversely quite close to the hardware. To my ears, it sounds like saying "pure hybrid"...you can't be both at the same time.

      Maybe I'm mixing up (virtual (machine language)) and ((virtual machine) language). From the perspective of the Java/.NET compiler it conceptually resembles machine language but it sure doesn't from the perspective of the actual hardware. I can see how Java being in a VM to begin with presents a similar model to running assembly on the actual machine but comparing the two in terms of efficiency and overhead is laughable. I was signalling my cognitive dissonance of conflating Java and assembly so directly.

      --
      Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
    14. Re:It's a question that WAS relevant by Anonymous Coward · · Score: 0

      Yes, it was great, but then you have the cost of 128 GP registers to pay for (both in money and power).

      Modern x86-like processors also have lots of registers (more like 64), although not directly addressable. They're used internally to improve parallelism by removing write dependencies. Something like:

      mov eax, ebx
      add eax, 3
      mov ecx, eax
      add eax, 7

      Would prevent parallelization completely if it weren't for register renaming (which is how processors tap into the extra registers). So, an i7 would, for instance:

      mov eax, ebx
      add eax, 3
      add eax, 7 into eax2 + mov ecx, eax

      Increasing performance 50% (at least, in this little piece of useless code).

      Don't check my assembly too much, I never remember the exact syntax, but the idea is that x86 processors also have lots of GP registers. It's just not all are directly accessible, and x86-64 has a bunch more directly accessible ones as well (16, instead of 8). The extra registers also help a lot for Hyper Threading, btw.

    15. Re:It's a question that WAS relevant by Anonymous Coward · · Score: 0

      AC posting again, a PD:

      The benefit of having virtual registers is that you're not locked in by your ISA in the number of registers you have to provide. They're free to increase or decrease, and they have done it as the need for more renaming arises to improve IPC counts. So having lots of GPR is not necessarily good for the ISA.

    16. Re:It's a question that WAS relevant by psmears · · Score: 1

      I can see how Java being in a VM to begin with presents a similar model to running assembly on the actual machine but comparing the two in terms of efficiency and overhead is laughable. I was signalling my cognitive dissonance of conflating Java and assembly so directly.

      You are aware that there are CPUs capable of executing Java bytecode directly? I.e. that use Java bytecode as (one of) their native assembly instruction set(s)?

    17. Re:It's a question that WAS relevant by petermgreen · · Score: 1

      The downside of having few registers in the ISA is it means the compiler may have to choose instruction ordering based on register availability or worse still "spill" registers to memory to fit the code to the available registers.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    18. Re:It's a question that WAS relevant by TangoMargarine · · Score: 1

      Yeah, and historically there were Lisp machines. The PC-combatible/x86 was implied.

      --
      Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
    19. Re:It's a question that WAS relevant by lars_stefan_axelsson · · Score: 1

      The downside of having few registers in the ISA is it means the compiler may have to choose instruction ordering based on register availability or worse still "spill" registers to memory to fit the code to the available registers.

      Yes, but the score boarding takes care of those spills as well. The processor won't actually perform them. But, whether they're visible or not, the compiler still has to optimise as if they're there in order to have a chance to wring out the maximum performance, so whether they're visible or not turns out to not mean that much in practice, rather, keeping them invisible isn't that much of a gain, as the compiler will have to assume that they're backed by invisible ones anyway and you'll take a substantial performance hit if they were ever to go away. (Which they won't as they take up next to no real estate today anyway.)

      --
      Stefan Axelsson
    20. Re:It's a question that WAS relevant by Anonymous Coward · · Score: 0

      Counting x86 as Accumulator architecture is somewhat misleading.
      It descended from such and thus had warts, but basic arith works on all registers.
      Long Mul & Div are special, but then that goes for pretty much all platforms.

      The real problem are the "dedicated" registers, particularly CL/CX, but also SI/DI

    21. Re:It's a question that WAS relevant by Anonymous Coward · · Score: 0

      Programmer friendly, yes

      But once virtual memory, caches and Out-of-Order comes along multiple memory arguments becomes a death trap.

      It's no accident that the CISCs that relied on it (PDP-11, VAX, M68k and NSC 32k) are all dead.

      The number of special cases that must be handled for cache misses, page faults and TLB misses grows beyond measure.

      There is some doubt whether NSC ever managed to produce truly working silicon for the NSC 32k series.

    22. Re:It's a question that WAS relevant by petermgreen · · Score: 1

      A spill is a write to memory, there is nothing special about the instructions used that indicates to the processor it is only temporary. That means at the very least the processor needs to check the cache policy of the target location before it eliminates it.

      AIUI much of the performance gains from going to x86-64 were attributed to the extra registers AMD added. These gains were even significant enough that someone put the effort into designing an ABI that uses 32-bit pointers but runs the CPU in 64-bit mode.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  3. Not what my last GF said to me by Anonymous Coward · · Score: 0, Funny

    She said, boy, you need to put some CISC into your RISC or I am gone, she said.

    1. Re: Not what my last GF said to me by Anonymous Coward · · Score: 0

      Thats just because I put some RISC into her CISC earlier that day.

  4. Its as if... by bazmail · · Score: 1

    ...a million smug old school douche-nozzle Mac users cried out and suddenly fell silent.

    1. Re:Its as if... by TWX · · Score: 1

      Mac users ... fell silent.

      I think that the movie you paid homage to was more realistic unfortunately...

      --
      Do not look into laser with remaining eye.
    2. Re: Its as if... by Anonymous Coward · · Score: 0

      They are still clinging to their Altivec units.

  5. so why is intel's 14nm haswell still at 3.5 watts? by lkcl · · Score: 0, Troll

    ok, so the effect of RISC vs CISC has absolutely *no* relation to power, right? so why in god's green earth is, for example, the allwinner a20 1.2ghz processor - which is still in 40nm btw - maxing out at 2.5 watts and delivering great 1080p video, reasonable 3D graphics and so on - yet intel is having to go to 14nm and, even at 14nm they STILL can't release a processor that, if you run it in a very limited configuration, is STILL listed as 3.5 watts??

    there's a quad-core rockchip 28nm SoC. maximum (actual) top power consumption: below 3.0 watts. intel's haswell tablet SoC is 20nm: it's 4.5 watts "Scenario" Design Power i.e. if you only run certain apps in certain ways it *might* keep below 4.5 watts.

    i really _really_ want to know why it is that intel cannot deliver an SoC that has an absolute peak limit of 2.5 watts.

  6. Re:Asses? by TWX · · Score: 1

    This is a discussion forum on the Internet. Definitely the former.

    --
    Do not look into laser with remaining eye.
  7. It's a general purpose vs dedicated thing by Anonymous Coward · · Score: 2, Interesting

    The CPU ISA isn't the important aspect. Reduced power consumption mostly stems from not needing a high end CPU because the expensive tasks are handled by dedicated hardware. What counts as top of the line ARM hardware can barely touch the processing power of a desktop CPU, but it doesn't need to be faster because all the bulk processing is handled by graphics cores and DSPs. Intel has for a long time tried to stave off the barrage of special purpose hardware. The attempts to make use of ever more general purpose CPU power sometimes bordered on sad clown territory (Remember Intel's attempt to make raytracing games look like something worth pursuing? Guess why: Raytracing is notoriously difficult to implement on graphics hardware due to the almost random data accesses.)

    1. Re:It's a general purpose vs dedicated thing by im_thatoneguy · · Score: 1

      And ironically specialized hardware is better than the CPU at raytracing and Intel might lose that battle as well after being its lone champion for so long.

      http://techreport.com/news/261...

  8. Re:so why is intel's 14nm haswell still at 3.5 wat by Anonymous Coward · · Score: 1

    You seem to be conveniently ignoring Intel's Atom and Quark lines. They're all x86 and none of them has a TDP larger than 3w.

  9. Complexity by Anonymous Coward · · Score: 0

    TFA measures how much energy is used for each processor to complete certain tasks, but it ignores how complex each processor is. Unsurprisingly, the processors with more transistors do better. Manufacturing cost is also ignored.

  10. I think it's safe to say... by Type44Q · · Score: 0
    I think it's safe to say that if a modern RISC processor were manufactured using Intel's latest process tech and given an x86-comparable number of transistors (how many are in Haswell? Or Vishera?) and put to use running a "bare metal" app that was optimized to take advantage of the architecture, it would blow the doors off the equivalent x86 solution.

    However, in the real world, x86's flexibility and versatility (including the plethora of developer tools available) win the day.

    1. Re:I think it's safe to say... by Anonymous Coward · · Score: 0

      It's just inertia, there is a huge legacy base of x86 software that no one wants to rewrite. It's nothing to do with x86's flexibility or versatility, unless you mean Intel's engineering teams' ability to keep slapping on extensions to their 1970's era ISA.

    2. Re:I think it's safe to say... by armanox · · Score: 1

      Honest question:

      What can you do on x86, that you can't do on POWER or MIPS?

      --
      I'm starting to think GNU is the problem with "GNU/Linux" these days.
    3. Re:I think it's safe to say... by Type44Q · · Score: 2

      That's easy: maintain compatbility with fucktons of legacy code; arguably more of which exists for x86 than every other architecture combined...

    4. Re:I think it's safe to say... by Anonymous Coward · · Score: 0

      It's safe to say that you are deeply in denial and ignoring the facts being presented in this article and throughout the media for many years now. You've got some kind of David and Goliath narrative running in your head and you let it smother any objective information that conflicts with your desired outcome. I've often rooted for the little guy, and loved it when I could get a better or more niche product than what Intel offered in large volume, but I had to be honest and recognize when Intel matched them and made them irrelevant.

      For many years, I have been using a wide range of hardware, both micro-benchmarking and observing them in general use with Linux and my own development efforts. In the PC, workstation and server space with Linux and many flavors of Unix, I've used Intel and AMD for nearly every generation from the 286 to present, including Xeon/Opteron variants; DEC Alpha; IBM POWER; SPARC and UltraSPARC; MIPS from R2000 to R4400; and even Motorola 68k. In the embedded and SoC space I've used MIPS, ARM, and Atom. I've used variants where nearly the same generation processor core was available in different system configurations with different on-die caches, different system memory buses, and different system I/O bridges.

      What I've seen is that the most observable differences in performance and power usage were due total system integrator choices, driven more by economics and product marketing plans than pure technology availability. Very expensive, high margin products could offer better performance by combining the latest in every tech category, but their benefits were often short-lived as the following mid-market products would overtake them in a very small number of years. Similarly, lower power systems could be produced by optimizing the system choices in the other direction, reducing performance.

      Individual products have stood out for a moment in time or deviated in performance/watt compared to their peers, but the general trend has been the entire industry converging as economies of scale brought the most exotic tech to the mainstream. This includes cache memory hierarchies; wider word sizes and memory addresses; on-chip memory controllers; MMU and IOMU; SIMD instruction set extensions; multi-core SMP packaging; the abandonment of parallel for high-speed serial I/O as exemplified in SATA and PCIe; 2D and 3D accelerators and GPUs; and the advancement of all of these through semiconductor process improvements and Moore's Law effects.

      As a bleeding-edge Linux adopter, I've also seen how much software/OS support is required to get the most out of each platform's capabilities. I've experienced before/during/after as new support was added for every power-saving function from the first kernel use of idle instructions; APM; ACPI; CPU frequency scaling; sleep states for peripheral controllers, disks, and monitors; display back-light control; and aggressive sleeping in Android. Similarly for performance features I experienced as new support was added for DMA; 2D accelerators; SMP; 3D accelerators and GPUs; SIMD instructions; 64-bit; NIC offload; NUMA; hardware virtualization; and again CPU frequency scaling.

      None of these major influences in total system performance or power usage would have changed much in either direction if a RISC design was substituted for CISC or vice versa. I can say that confidently because I witnessed how these changes improved systems with the same CPU ISA and core design/process. At the same time, differences between brands at a similar price point were often no larger than these bumps within each product line. Trying to comparison shop for large purchases was often an exercise in frustration as by the time your quotes and procurement went through to delivery, newly released products had once again flipped all your value matrices and price points upside-down.

    5. Re:I think it's safe to say... by armanox · · Score: 1

      There's plenty of legacy code for SPARC, MIPS, and POWER. And a lot of code can be recompiled for a different platform without much trouble (and there's plenty that can't).

      --
      I'm starting to think GNU is the problem with "GNU/Linux" these days.
  11. not now, but it certainly did in the past. by nimbius · · Score: 2

    as a greybeard I remember when choosing Intel over Sun meant the project wasnt completed on time, and your electrical/mechanical engineering group lived in the breakroom while their jobs chugged along. Intel was a toy train compared to the power you'd get with RISC. however I can somewhat confidently say the RISC CISC battle is moot these days because x86 has largely caught up to power, sparc, and others. a competent argument could be made however that if it werent for AMD, most servers would probably still be running some flavour of RISC. The foolhardy nature of SUN and SGI can also be argued as a cause of their demise, but ill not flame. Intel wouldn't have bothered to get off their duff without a poke in the ribs from AMD; they had partnerships with RISC manufacturers anyhow and their own RISC-ish processor called itanium. outside of performance though there is another reason people stick with Power and others just as they have in the past. Lock-in.

    you see, applications like Oracle Business Objects and JD Edwards come with a quid-pro-quo of exacting standards to which most businesses must adhere. Namely, IBM or Sun/Oracle hardware. You may only need accounting and payroll, but you'll have to clear a corner of the room for the circus to set up their hardware and make sure everything is "just so." Their hope is that their quiet mandate becomes your quiet mandate, and before you know it other systems that interact with JDE are now required to be Power-based because "thats what runs JDE." The only way out of this is to realize that any business that doesnt explicitly do payroll or metrics for profit, doesnt need the kind of horsepower decreed by things like SAP.

    --
    Good people go to bed earlier.
    1. Re: not now, but it certainly did in the past. by the_humeister · · Score: 1

      Intel has had several RISC chips (eg i960), but Itanium is VLIW (ie not RISC or CISC).

      Certainly it can be argued that it's AMD's fault for the current dominance of x86, but it's also true that none of the other architectures were cheap enough for the general populace to adopt, hence the abundance of ARM nowadays and POWER, SPARC, and currently on life support

    2. Re:not now, but it certainly did in the past. by Anonymous Coward · · Score: 0

      In the light desktop workstation performance race Intel started to make their mark when they released the original Pentium against the then available RISC competitors and the i486s. I fondly remember how supreme the HP RISC's floating point performance was then, compared to the others at that specific point in time.

  12. Re:so why is intel's 14nm haswell still at 3.5 wat by timeOday · · Score: 4, Insightful
    Here is your answer, the A20 is freakishly slow compared to anything Intel would put their name on.

    Granted, you can build a tablet to do specific tasks (like decoding video codecs) around a really slow processor and some special-purpose DSPs. But perhaps the companies in that business aren't making enough profit to interest Intel.

  13. Re:so why is intel's 14nm haswell still at 3.5 wat by Anonymous Coward · · Score: 0

    FYI

    http://en.wikipedia.org/wiki/List_of_CPU_power_dissipation_figures#Intel_Atom

    Below 2.5 watts for roughly 25% of the steppings. And since there's not really a significant advantage to the SoC configuration you describe above (CPU + 3D graphics), unless you're trying to squeeze your system into a thimble, if any of those Atom CPUs don't come with 3D graphics, throw a cheap lower-power 3D graphics chip into the mix and you're still probably under 2.5 watts.

  14. so why is intel's 14nm haswell still at 3.5 watts? by Anonymous Coward · · Score: 1

    The examples you described(3D graphics and video decoding) are both handled by the GPU which breaks the data sets down through SIMD. One algorithm single dataset optimization. That really has little to do with anything having to do with this discussion of CPUs.

    When comparing the CPU cores between the Allwinner a20 and a broadwell atom. The broadwell atom is more performant by a wide margin.

  15. Re:so why is intel's 14nm haswell still at 3.5 wat by Bill,+Shooter+of+Bul · · Score: 1

    Well, the power consumption of various processor architectures are a *bit* more complicated than RISC vs CISC which is the point of this story.

    --
    Well.. maybe. Or Maybe not. But Definitely not sort of.
  16. Re:so why is intel's 14nm haswell still at 3.5 wat by Anonymous Coward · · Score: 0

    I have not RTFA but my guess is that Intel is trying to compare apples to apples and that would mean (and again I am assuming) that at the same process (14nm, for instance) and with similar circuit design densities that the power envelope is more or less the same. I am still not convinced, but trying to figure out how they came to that conclusion given that Intel has not come that close to the power efficiencies found in modern RISC designs. I also don't get their scaling argument either as some of the largest (by number of cores) supercomputers in the world are (still) RISC based or are using GPUs that would be closer to RISC designs than CISC given their limited instruction set. Sooo, this sounds like Intel we-are-so-awesome-aren't-we BS.

  17. so why is intel's 14nm haswell still at 3.5 watts? by Anonymous Coward · · Score: 0

    If the ISA does not actually affect the performance of a modern processor, then why does the 64-bit ARM architecture outperform the 32-bit. Surely 32-bit ARM code is at least comparable to x86 in architectural elegance.

  18. Re:so why is intel's 14nm haswell still at 3.5 wat by Anonymous Coward · · Score: 0

    As ARM chips creep up in processing power and power consumption, there's no good reason to develop in the opposite direction. Intel wants to be where ARM is headed, not where it has been.

    1080p video and reasonable 3D graphics are nice to look at, but they're terrible CPU benchmarks, because the CPU only shuffles data into and out of special purpose hardware. You can do high definition video and 3D graphics with a shitty single core ARM processor, provided the dedicated coprocessors do all the work: You've probably heard of the most famous incarnation of that concept, the Raspberry Pi. The Broadcom chip at the core of the Raspberry Pi doesn't even boot using the CPU. The graphics core starts first and loads the firmware. Only then does the CPU start. That's where the ARM world is coming from: Powerful special purpose hardware together with a "microcontroller" of a CPU. Intel is coming from the opposite end of the spectrum, where general purpose processing is king, and special purpose hardware is seen as an optional add-on.

  19. There's a lot more going on... by Anonymous Coward · · Score: 1

    The benefit to CISC instructions is the ability to get the processor to do more work with less instructions.

    While it may seem like a compiler can negate that, for a non-trivial number of programs you can get much higher code density using CISC instructions, which in turn frees up memory cycles for data instead of code. One of the key arguements I head back in the late 90s/early 00s however was that Hybridized chips was where it was at. If you go and look at previously 'CISC' and 'RISC' chips and look at the instruction sets, the majority of RISC chips have added a number of CISC-like instructions for certain types of operations (notably floating point and byte array/string handling stuff) while CISC chips have basically just turned into a translation layer over a slightly more complicated RISC core that is only really optimized for say the top 50-100 operations, and everything else is 2+x slower than the old CISC implementations (instructions that used to take say 2 cycles due to dedicated circuitry may now take 4+ due to reuse of micro-ops).

    1. Re:There's a lot more going on... by Zero__Kelvin · · Score: 0

      No, the benefit of RISC is that you have many more on chip registers that can be used to store data, manipulate data, etc sans a cache hit. With CISC the minute you want to keep more than a handful of data elements in an array to operate on in parrallel, you're taking a cache hit. Several actually.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    2. Re:There's a lot more going on... by Rockoon · · Score: 1, Insightful

      No, the benefit of RISC is that you have many more on chip registers

      Nothing about RISC makes more registers inherent, and nothing about CISC makes less registers inherent. Now shut the fuck up and let the real nerds discuss.

      --
      "His name was James Damore."
    3. Re:There's a lot more going on... by Zero__Kelvin · · Score: 1

      That's absolutely correct, unless of course you count the fact that you can't create a CISC CPU with just as many registers that can be used to store data, manipulate data, etc sans a cache hit as a RISC CPU given the same die size, which is another way of saying you are an idiot who quoted half my sentence and then tried to make it look like I merely said "CISC can't have lots of registers!".

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    4. Re:There's a lot more going on... by Guy+Harris · · Score: 1

      That's absolutely correct, unless of course you count the fact that you can't create a CISC CPU with just as many registers that can be used to store data, manipulate data, etc sans a cache hit as a RISC CPU given the same die size.

      You can't? You can't trade off, say, transistors used for registers (especially given that the bigger processors do register renaming, so you have more hardware registers than the actual RISC/CISC instruction set provides) for transistors used for some other purpose?

    5. Re:There's a lot more going on... by Rockoon · · Score: 0

      unless of course you count the fact that you can't create a CISC CPU with just as many registers that can be used to store data, manipulate data, etc sans a cache hit as a RISC CPU given the same die size

      Yes you can.

      It appears that you think that the additional decoder depth (multiple u-ops per instruction decoded) that CISC requires isnt a tradeoff for the additional decoder width (more decoders to equal the same effective u-ops per cycle) that RISC requires for the same performance characteristics.

      You would be wrong. You are the classic example of the guy that learned one fucking little thing and then imagined an entire imaginary universe from it. Yes, Intels CISC decoders are bigger, but Intel needs less of them than RISC does for the same u-ops per cycle fed into the pipeline and Intel also doesnt need as much memory bandwidth feeding its decoders. Both of the things that RISC needs to match performance also cost the precision silicon, that one thing you knew about for CISC but amazingly were completely ignorant about for CISC. You knew one fucking thing. You imagined a universe. You blew it. You ignorant twat.

      Now, simply shut the fuck up and let the REAL nerds discuss.

      --
      "His name was James Damore."
    6. Re:There's a lot more going on... by QQBoss · · Score: 1

      That is more or less accurate. The goals of the original RISC were stated to be making a Reduced Instruction Set Computer, but what was in fact produced was a Reduced Instruction Set Complexity CPU. By restricting the touching of memory to only loads and stores, all other instructions that were able to be executed in one clock COULD be executed in one clock always. Whereas some CISC instructions involving arrays could kick off 10+ memory touches as a side effect, RISC instructions could never do that (sans via exceptions). So when all 10 of those memory touches weren't required, the RISC architecture could optimize away the unnecessary ones (which was a bitch in 1990, but common place by 2000 and exceedingly trivial by 2010, to put it roughly).

      I taught CISC architectures (68K mostly) and was a minor architect for PowerPC (I helped work on the early EABI- embedded application binary interface- architecture)

      But this leads to a problem: Cache. That CISC operation that made 10 memory touches took roughly 10-18 bytes of instruction storage (68K example), and 10 data cache accesses that would either hit or miss. But a 16 bit RISC would take 22 bytes (and didn't double the number of useful registers available) and a 32 bit RISC would take 44 bytes (but generally doubled the number of useful registers, reducing the need for so many loads and stores). Thank goodness you took fewer transistors to implement the instruction pipeline, because you need them all back to make the Icache bigger! The hope being that those 10 memory touches were rarely needed if you had more registers, so you could cut back on other loads somewhere (but we didn't get really good at doing that automatically until the late '90s, by which time we could show that the RISC penalty was effectively negated, specific numbers remain the property of my name-changed employer but were down to single digit percentage differences). Dcache would have the same hits and misses, unless you were also able to allocate saved transistors to some Dcache which might affect hit rates by some low single percentage points.

      But with complicated instructions come pipeline clocking challenges. Implementing the entire x86 pipeline in 5 stages would result in having a sub-200 MHz pipeline today- the P4 push to 4 GHz required up to 19 stages (and who knows how many designers) worst case, IIRC! Meanwhile, most RISC architectures zoom along happily with 5-7 stages and only manufacturing nodes or target design decisions keep them from clocking up to x86 frequencies.

      Hands down, it was never any 'benefits' of CISC (or, specifically, the x86 architecture) that allowed Intel to take the field, it was market forces and manufacturing might. A win is a win.

      BTW, to the AC GP, just because an instruction appears complex (most SIMD operations, MADDs, FPSQRTRES, etc...), they still count as RISC if they can be either executed in one clock or at least pipelined with nominally one result per clock if they don't impact the pipeline for all the other commonly executed instructions. After all, we can made a divide instruction execute in 1 clock, too, as long as you don't mind your add instructions taking 16x longer (though still one clock), but that is cheating.

    7. Re:There's a lot more going on... by QQBoss · · Score: 1

      To correct myself, based on something I read downstream (thanks, trparky), the P4 was 31 stages, not 19. That is really a number I shouldn't have misremembered.

    8. Re:There's a lot more going on... by Guy+Harris · · Score: 1

      Whereas some CISC instructions involving arrays could kick off 10+ memory touches as a side effect ... That CISC operation that made 10 memory touches took roughly 10-18 bytes of instruction storage (68K example)

      OK, that's probably using "memory indirect postindexed mode". Addressing modes that complex are something some CISC processors had, but not others; x86 is much less complex (scaling, but no memory-indirect or auto-increment/auto-decrement), and S/3x0 even less complex than that (no scaling, just double-indexing).

      How often was that addressing mode used, in practice? Was it used often enough that you saved enough code space that you could make the I cache smaller?

    9. Re:There's a lot more going on... by Zero__Kelvin · · Score: 1

      "Now, simply shut the fuck up and let the REAL nerds discuss."

      ROTFLMAO. That's pretty funny there #1252108 :-)

      What you can't seem to grasp is even a layman could figure out how ridiculous your claim is with absolutely no understanding of the differences between RISC and CISC.

      Rockgoon the PHB: How comes we cantz just adz us a bunch more registerz on the same die size with the same functionality?

      Skilled CPU Designer: But PHBoss, we already have the die saturated with as much functionality as we can! We'd love to add more registers, but we'd have to increase our available instruction set even more, which would require more decoder logic, and adding 20 more registers would require additional pipelines, and a host of other logic. We already are using all the capacity we have! That is one of the reasons we are always trying to move to smaller and smaller fab processes!

      Rockgoon the PHP: That's ridiculous! You guys just never thought of it! Admit it! You're lazy! There's plenty of room! Just get rid of some uneeeded circuits! You are the classic example of the guy that learned one fucking little thing and then imagined an entire imaginary universe from it!

      In other words, yes, you are half an idiot savant. The first half :-)

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    10. Re:There's a lot more going on... by Zero__Kelvin · · Score: 1

      No. That's correct. You can't add registers, keep the same functionality, and add all the circuitry to suport said functionality by reducing functionality and taking away regsiters. Who would have thought?

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    11. Re:There's a lot more going on... by Guy+Harris · · Score: 1

      No. That's correct. You can't add registers, keep the same functionality, and add all the circuitry to suport said functionality by reducing functionality and taking away regsiters. Who would have thought?

      That isn't answering the question I asked.

      The question I asked was "You can't trade off, say, transistors used for registers (especially given that the bigger processors do register renaming, so you have more hardware registers than the actual RISC/CISC instruction set provides) for transistors used for some other purpose?"

      I said nothing about keeping all the same functionality, if by "functionality" you mean, for example, "on-chip caches of the same size" and "same number of hardware registers including ones used for renaming of the architected registers" or "complexity of the branch prediction hardware" or .... Yes, there may be tradeoffs you have to make in how you use your transistors, but if the benefits of the additional registers outweigh whatever performance benefits you lose by reducing the size of other functional units by however much the additional registers require, that might be the right tradeoff to make.

    12. Re:There's a lot more going on... by Zero__Kelvin · · Score: 1

      "I said nothing about keeping all the same functionality, if by "functionality" you mean"

      The problem is that you were assuming you are the only one speaking in this thread. The discussion was about adding more registers in a CISC architecture, and so CISC functionality is the context. When you ask what "the same functionality means" that is absurd. You can't implement a subset of the functionality and still have the same functionality.

      I'll put this in simpler terms. Smart people design CPUs and they don't add a bunch of registers even though that would be useful. The reason they don't do it is because of the additional chip real estate it would cost in an already over-taxed landscape, not because they are lazy or haven't though of the idea.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    13. Re:There's a lot more going on... by Anonymous Coward · · Score: 0

      IIRC the worst case for VAX instructions was 66 missis (as 22 cache misses + 22 page faults + 22 TLB misses).

      That involved both the instruction, the indirect addresses and the final arguments being misaligned

      Yuck

    14. Re:There's a lot more going on... by Guy+Harris · · Score: 1

      The discussion was about adding more registers in a CISC architecture, and so CISC functionality is the context.

      "CISC functionality" is the ability to execute a given CISC instruction set with acceptable performance. Transistors can be used in several different ways to achieve that, and you can choose to use fewer transistors in one place on a chip in favor of more transistors in another place, and if that choice means you still get better overall performance executing the same instruction set, that choice is a good one.

      When you ask what "the same functionality means" that is absurd. You can't implement a subset of the functionality and still have the same functionality.

      Again, as long as the full instruction set can be executed (even if some of it is executed by trap code), you don't have a subset. You may happen to execute some functions slower, and other functions faster, but if the net result is faster execution of the code actually run on the machine, you have a better implementation.

      I'll put this in simpler terms. Smart people design CPUs and they don't add a bunch of registers even though that would be useful.

      Smart people add registers iff they're sufficiently useful that it's worth either increasing the die size or taking transistors away from other functions.

      The reason they don't do it is because of the additional chip real estate it would cost in an already over-taxed landscape, not because they are lazy or haven't though of the idea.

      For existing architectures, the reason they don't do it is that it would require changes to the instruction format, which, for most instruction set architectures, would be a royal pain. For x86, they (AMD, to be specific) could and did add Yet Another Prefix to double the number of registers as the instruction set already had a tradition of adding prefixes. For ARM, they were already introducing a 64-bit variant of the instruction set, and didn't have to maintain binary compatibility. For, for example, System/3x0, you'd have to add prefixes to an instruction set lacking prefixes, or somehow use opcode bits to refer to additional registers. If somebody were to design a brand new CISC architecture (in an era where we're not designing many new instruction set architectures at all), they could design one with 32 GPRs.

    15. Re:There's a lot more going on... by Zero__Kelvin · · Score: 1

      ""CISC functionality" is the ability to execute a given CISC instruction set with acceptable performance."

      When you start off with a broken definition, everything you say about it becomes suspect.

      "you can choose to use fewer transistors in one place on a chip in favor of more transistors in another place,"

      That's a phenomenally stupid thing to say, and represents a complete lack of understanding of circuit design in general.

      "You may happen to execute some functions slower, and other functions faster, but if the net result is faster execution of the code actually run on the machine, you have a better implementation."

      This statement represents a complete lack of understanding of benchmarking.

      "For existing architectures, the reason they don't do it is that it would require changes to the instruction format, which, for most instruction set architectures, would be a royal pain."

      Again, that is just phenomenally stupid, and alludes to the idea that the designers are lazy, as well as ignoring the fact the nothing about chip design is anything other than "a royal pain."

      " For ARM, they were already introducing a 64-bit variant of the instruction set, and didn't have to maintain binary compatibility."

      Do you even think about what you write before you write it?

      Just accept that your attempts to sound like you have a clue failed miserably, and you have been called on it. You haven't made a coherent point in the entire post, and you need to let qualified designers make decisions rather than playing "armchair designer."

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  20. Re:so why is intel's 14nm haswell still at 3.5 wat by Anonymous Coward · · Score: 0

    Wait, what? These chips aren't at all comparable in performance. For instance, consider a 7-zip lzma benchmark from http://www.7-cpu.com/:
        A20: 2 cores: 1 ghz: 880 mips compressing, 1560 mips decompressing
        Haswell: 8 cores: 3.4 ghz: 20500 mips compressing, 21000 mips decompressing

  21. efficiency matters by bzipitidoo · · Score: 2

    This study looks seriously flawed. They just throw up their hands at doing a direct comparison of architectures when they try to use extremely complicated systems and sort of do their best to beat down and control all the factors that introduces. One of the basic principles of a scientific study is that independent variables are controlled. It's very hard to say how much the instruction set architecture matters when you can't tell what pipelining, out of order execution, branch prediction, speculative execution, caching, shadowing (of registers), and so on are doing to speed things up. An external factor that could influence the outcome is temperature. Maybe one computer was in a hotter corner of the test lab than the other, and had to spend extra power just overcoming the higher resistance that higher temperatures cause.

    It might have been better to approach this from an angle of simulation. Simulate a more idealized computer system, one without so many factors to control.

    --
    Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
  22. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  23. Don't be silly by tobiasly · · Score: 1

    RISC architecture is going to change everything.

    1. Re:Don't be silly by mean+pun · · Score: 1

      RISC architecture is going to change everything.

      Agreed, as soon as they can do submicron technology. By the way, for some strange reason I feel like I've been sleeping a decade.

    2. Re:Don't be silly by Java+Pimp · · Score: 1

      RISC is good!

      --
      Ascalante: Your bride is over 3,000 years old.
      Kull: She told me she was 19!
    3. Re:Don't be silly by jellomizer · · Score: 1

      They did... 20 years ago... CISC had changed its ways to be more RISCy

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    4. Re:Don't be silly by Anonymous Coward · · Score: 0

      It's okay, I get the reference.

  24. The article is bad - mfg technology dominates by hkultala · · Score: 2

    They are seriously comparing some 90nm process with much better intel 32nm and 45 nm processes.

    They have just taken some random cores made on random (and uncomparable) manufacturing technologies, throw couple of benchmarks and try to declare universal results based on these.

    Few facts about the benchmarks setup and the cores cores:

    1) They use ancient version of GCC. ARM suffers this much more than x86.
    2) Bobcat is relatively balanced core, no bad bottlenecks. mfg tech is cheap, not high performance but relatively small/new.
    3) Cortex A8 and A9 are really starved by bad cache design. Newer A7 and A12 would be similar in area and powet consumption but much better in performance and performance/power. There are also manufactured on old cheap mfg processes, which hurt them. Use modern manufacturing tech and results are quite much better
    4) Their loonson is made on ANCIENT technology. With modern mfg tech it would be many times better on performance/power.
    5) The cortex A15, even though made on 32nm process, is cheap process, not much better than intel's 45nm process and much worse than intel's 32nm. Also it's known to be a "power hog"-design. Qualcomm's Krait has similar performance level, but with much lower power.

    1. Re:The article is bad - mfg technology dominates by WorBlux · · Score: 1

      Yes, the Ingenic MIPS are a good example of modern MIPS design, with quad core at 1GHZ comeing in at a one watt TDP. The loonbson 3B probably could have beat the i7 on some workloads if it had come out on schedual and a modern process. Now that intel has openCL support it's less likely as half the cores in the loongson 3 series are/were supposed to bed dedicated vector processing units. Another interesting comparison would be the MIPS Tilera design vs. Intel's x86 Knights/table Many Integrated Cores (x86-MIC) design. I would think it's easier to fits lots of MIPS cores on a chip, but then again the x86-MIC leaves a lot of the fancier parts out of the cores.

    2. Re:The article is bad - mfg technology dominates by edxwelch · · Score: 1

      From the origonal paper www.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf (which ExtremeTech does not link to):

      Technology scaling and projections:
      Since the i7 processor is 32nm and the Cortex-A8 is 65nm, we use technology node characteristics from the 2007 ITRS tables to normalize to the 45nm technology node in two results where we factor out tech-
      nology; we do not account for device type (LOP, HP, LSTP).
      For our 45nm projections, the A8â(TM)s power is scaled by 0.8Ã-- and
      the i7â(TM)s power by 1.3Ã--. In some results, we scale frequency
      to 1 GHz, accounting for DVFS impact on voltage using the
      mappings disclosed for Intel SCC [5]. When frequency scal-
      ing, we assume that 20% of the i7â(TM)s power is static and does
      not scale with frequency; all other cores are assumed to have
      negligible static power. When frequency scaling, A8â(TM)s power is
      scaled by 1.2Ã--, Atomâ(TM)s power by 0.8Ã--, and i7â(TM)s power by 0.6Ã--.
      We acknowledge that this scaling introduces some error to our
      technology-scaled power comparison, but feel it is a reasonable
      strategy and doesnâ(TM)t affect our primary findings (see Table 4).

    3. Re:The article is bad - mfg technology dominates by Anonymous Coward · · Score: 0

      Please compare apples with apples.
      Intels smaller sizes = better efficiency - compensate to same micron size and then compare.

      Assember programmers want registers, lots of em. Registers are power hungry, relative to decoding, and CISC chews power when employing register renames. Intel and Zilog were successful because it game em lots of registers. RISC will beat the pants off CISC if the logic is clean, all things being equal.

      Throw in DSP like assistants - and it gets muddy, but there will be a sweet spot that moves micron fab size and register speed.
      If you want to save power/efficiency, use less registers or make them shorter ; or smaller(Intel).

  25. Re: so why is intel's 14nm haswell still at 3.5 wa by the_humeister · · Score: 2

    No relation to energy used. It's in the article: Haswell will get it's work done faster and use about the same energy as the slower chips that take longer. What matters is architecture, not ISA (Atom is lower power than Haswell at the same process node).

  26. Final nail in the Itanium coffin by Christian+Smith · · Score: 1, Interesting

    20 years ago, RISC vs CISC absolutely mattered. The x86 decoding was a major bottleneck and transistor budget overhead.

    As the years have gone by, the x86 decode overhead has been dwarfed by the overhead of other units like functional units, reorder buffers, branch prediction, caches etc. The years have been kind to x86, making the x86 overhead appear like noise in performance. Just an extra stage in an already long pipeline.

    All of which paints a bleak picture for Itanium. There is no compelling reason to keep Itanium alive other than existing contractual agreements with HP. SGI was the only other major Itanium holdout, and they basically dumped it long ago. And Itaiums are basically just glorified space heaters in terms of power usage.

    1. Re:Final nail in the Itanium coffin by trparky · · Score: 1

      You can't add too many stages to the pipeline or you end up with the Intel NetBurst Pentium 4 Prescott mess again. It had an horrendously long 31-stage pipeline. Can you say... Branch Prediction Failure? The damn thing produced more branch prediction failures than actual work. That's essentially why the NetBurst was completely scrapped and why they went back to the drawing board.

    2. Re:Final nail in the Itanium coffin by phantomfive · · Score: 1

      As the years have gone by, the x86 decode overhead has been dwarfed by the overhead of other units like functional units, reorder buffers, branch prediction, caches etc. The years have been kind to x86, making the x86 overhead appear like noise in performance. Just an extra stage in an already long pipeline.

      And all that long pipeline takes power to run (recently this argument comes up in discussions of mobile devices more than in the server room, because battery life is hugely important, and ARM in the serverroom is still a joke). ARM chips sometimes don't even have cache, let alone reorder buffers, and branch prediction. When you remove all that stuff, the ISA becomes more important in terms of power consumption.

      Of course, as someone else pointed out, they were comparing 90nm chips to 32nm and 45nm. Why that is a problem will be left as an exercise for the reader.

      --
      "First they came for the slanderers and i said nothing."
    3. Re:Final nail in the Itanium coffin by Anonymous Coward · · Score: 0

      RISC has another, more serious problem.

      It relies on the myth that you can either:
      Depend on programmers to right theoretically awesome code
      or
      One can write a magic compiler that can do the above for you automatically

      We all know what the answer to these questions are. Code is, by and large, crap. Modern processors spend most of their time effectively de-crapping code and predicting what crap will come down the wire next (TO avoid really slow main memory fetching). Actual execution is almost secondary.

      When it comes down to it, in general purpose computing you'll do a lot better betting on your hardware engineers than you will your programmers.

      GENERALLY SPEAKING: The guys that make the processors are brilliant. The average programmer is a bullshit artist in comparison. (Market forces cause this)

      The best computing platform will be the one that can run bad code the best. Nobody has time to right clean, perfect, mathematically sound, arch specific code for every application.

    4. Re:Final nail in the Itanium coffin by WuphonsReach · · Score: 2

      All of which paints a bleak picture for Itanium. There is no compelling reason to keep Itanium alive other than existing contractual agreements with HP. SGI was the only other major Itanium holdout, and they basically dumped it long ago. And Itaiums are basically just glorified space heaters in terms of power usage.

      Itanium was dead on arrival.

      It ran existing x86 code much slower. So if you wanted to move up to 64bit (and use Itanium to get there), you had to pay a lot more for your processors, just to run your existing workload.

      Okay, you say, but everyone was supposed to stop running x86 and start running Itanium binaries! Please put down the pipe and come back to reality. No company is going to repurchase all of their software to run on a new platform, just because Intel says this is the way forward.

      Maybe, maybe! If all of the business software was open-source and easily ported to a different CPU architecture it might have worked. But only if you'd gain a 3x-5x improvement in wall clock performance by porting from x86 to Itanium instruction sets. (An advantage that never materialized.)

      And once AMD started shipping AMD64 and Opterons that could run your existing x86 workload, on a 64bit CPU, at slightly fastter speeds then your old kit for the same price - that buried any chance of Itanium ever succeeding in the market. Any forward looking IT person, when it came time to upgrade old kit, chose AMD64 - because while they might be running 32bit OS/progs today, the 64bit train was rumbling down the tracks. So picking a chip that could do both, and do both well, was the best move.

      --
      Wolde you bothe eate your cake, and have your cake?
    5. Re:Final nail in the Itanium coffin by Curate · · Score: 1

      All of which paints a bleak picture for Itanium.

      Wow, that is a rather bold prediction to be making in 2014. If Itanium does eventually start to falter in the marketplace, then you sir are a visionary.

    6. Re:Final nail in the Itanium coffin by unixisc · · Score: 1

      Itanium was first conceived as a VLIW CPU. As its development progressed, it was found that the real estates savings due to moving everything into the compiler was minimal, while in the meantime, the compiler was a bitch to write. Also, under the original VLIW vision, software would need to be recompiled every time for a new CPU Which could be a dream for the GNU world, which requires the availability of source code, but practically, a bitch for the real world

      Today's Itanium, unlike Merced, is now more of a RISC CPU, w/ flags indicating which branches need to be taken, or w/ the same hardware that RISC has for register renaming. In short, Itanium III is really a RISC CPU, much like the i860 and i960 before it. Too bad that it's kept restricted to the ancient foundries, making it both expensive and a power hog.

      You know that the CPU is really bad when even Linux drops support for it, and within FreeBSD, the LLVM/Clang project removes its binaries from the package. Wonder whether NetBSD came far in supporting it?

  27. CISC - reduced memory access ... by perpenso · · Score: 3, Interesting

    x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.

    Actually it is. Modern performance tuning has a lot to do with cache misses and such. CISC can allow for more instructions per cache hit. The strategy of a hybrid type design, CISC external architecture and RISC internal architecture definitely has some advantages.

    That said, the point of RISC was not solely execution speed. It was also simplicity of design. A simplicity that allowed organization with less money and resources than Intel to design very capable CPUs.

    1. Re:CISC - reduced memory access ... by Darinbob · · Score: 1

      RISC came out when Intel was only doing tiny microchips, the RISC market was not competing with it. One of the advantages of CISC at the time was indeed that it was easy to implement, because you just build the micro architecture most of the rest was microcoding the instructions and putting that into ROM. If you needed to add a couple new instructions for the next release to stay competitive, it could be done very quickly (and you could patch your computers on the fly to get the new instructions too).

      Yes, simplicity of design was important, but the simplicity was to free up chip resources to use elsewhere, not to make it easier for humans to design it.

    2. Re:CISC - reduced memory access ... by perpenso · · Score: 1

      RISC came out when Intel was only doing tiny microchips, the RISC market was not competing with it.

      The reference CISC platform, the "competition", at the time was the VAX. Same arguments, different target.

    3. Re:CISC - reduced memory access ... by lars_stefan_axelsson · · Score: 1

      Yes, simplicity of design was important, but the simplicity was to free up chip resources to use elsewhere, not to make it easier for humans to design it.

      Well, yes. I think we're forgeting one of the main drivers for RISC, and that was making the hardware more compatible with what the then current compilers could actually fruitfully use. Compilers couldn't (and typically didn't) actually use all the hideously complex instructions that did "a lot" and were instead hampered by lack of registers, lack of orthogonality etc. So there was a concerted effort to develop hardware that fit the compilers, instead of the other way around, which had been the dominating paradigm up to that point.

      Take for example the MIPS without interlocked pipe-line stages. That was difficult for a human assembly coder to keep track of, but easy for a compiler, and it made the hardware design simpler and faster, so that's the way they went. (In fact, the assembler put in no-ops for you to fix inject pipeline stalls in order for your code to make sense when you programmed it in assembly. That made the object dump show stuff you didn't put there which was a bit disconcerting... :-))

      --
      Stefan Axelsson
    4. Re:CISC - reduced memory access ... by Darinbob · · Score: 1

      Some of what you say is true. Register allocation was simple in RISC, but most of the competing CISC machines also had very orthagonal registers as well (PDP and VAX were the classic CISC machines, x86 isn't even in the picture yet). Also some CISC machines were adding instructions that compilers had trouble using, often because the compilers had to fit in a small amount of memory.

      However many RISC machines required much more advanced compilers if you wanted optimization. I think for basic compilation RISC is very straight forward, but to try to make that actually efficient code then the compiler is much more difficult. Basic compilation for CISC was very easy at the time (many had orthagonal registers and convenient instructions). The difference is in how well the unoptimized code ran. Memory was very expensive at the time and not always slower than registers, so the push on CISC compilers was to squeeze the space and make instructions do more (not counting things like super computers which were really fore-runners of RISC).

      Much of the later complexity didn't exist in the late 70s. Many machines didn't have pipelining, there weren't instruction queues for loops, only one ALU, external memory wasn't always slower than CPU registers, etc. However by freeing up space on the CPU with RISC this opened up the way to add more advanced features in cheaper CPUs and thus the new need to have advanced compiler technology even on basic workstations and minicomputers. Also RAM got very cheap but also got a lot slower relative to the CPU which also drove more change in CPU and compiler goals.

    5. Re:CISC - reduced memory access ... by lars_stefan_axelsson · · Score: 1

      Much of the later complexity didn't exist in the late 70s.

      Yes, I should have said that I put RISC as beginning with Hennessy & Patterson's work, that became MIPS and SPARC respectively. So we're a bit later than that. And of course when I said "compiler" I meant "optimizing compiler". Basic compilation as you say, was not a problem on CISC, but everybody observed that the instruction set wasn't really used. I remember reading VAX code from the C-compiler (on BSD 4.2) when I was an undergrad and noting that the enter/leave instructions weren't used. My betters answered: "Of course it isn't, they put so much useless stuff in there that it's much too slow..." (Only they didn't use the word "stuff"...)

      But yes, the x86 is perhaps more "braindead" than "CISC" from that perspective, I was actually thinking VAX and it's ilk as they were what "RISC" came to replace, since the x86 wasn't a serious contender for workstations/minicomputers when they entered the arena. It was strictly for "PCs", which were a decidedly lesser class of computer, for lesser things. If anything RISC replaced the MC68000 and similar in the workstation space. And even though that was CISC, it was of course a much nicer architecture than Intels ever were, or became.

      --
      Stefan Axelsson
  28. REAL RISC? by gelfling · · Score: 1

    Because years and years ago it was obvious that what manufacturers were calling RISC wasn't really that. It was typically some middle ground between REAL RISC and something else. Back in the day x86 had about 374 instructions and a SUN or analogous IBM chip had about 150-175 instructions. But according to the actual science, a RISC chip should only have 30-45 instructions. So for the sake of flexibility manufacturers split the difference and built chips that were neither fish nor fowl. If someone to actually have a real high power chip that ran only 40 instructions I wonder if the benchmarks would come out differently. Or maybe they wouldn't because the benchmarks themselves make some attempt to model the complexity of real world scenarios. And if that's the case then the REAL RISC chips would be stumbling trying to execute most things in software not the instruction set.

    1. Re:REAL RISC? by VTBlue · · Score: 1

      I'm waiting for a 'true Scotsman' comment :)

  29. This is a myth that is not true by hkultala · · Score: 5, Informative

    That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.

    This same myth keeps being repeated by people who don't really understand the details on how processors internally work.

    You cannot just change the decoder, the instruction set affect the internals a lot:

    1) Condition handling is totally different on different instruciton sets. This affect the banckend a lot. X86 has flags registers, many other architectures have predicate registers, some predicate registers with different conditions.

    2) There are totally different number of general purpose and floating point registers. The register renamer makes this a smaller difference, but then there is the fact that most RISC's use same registers for both FPU and integer, X86 has separate registers for both. And this totally separates them, the internal buses between the register files and function units in the processor are done very differently.

    3) Memory addressing modes are very different. X86 still does relatively complex address calculations on single micro-operation, so it has more complex address calculation units.

    4) Whether there are operations with more than 2 inputs, or more than 1 output has quite big impact on what kind of internal buses are needed, how many register read and write ports are needed.

    5) There are a LOT of more complex instructions in X86 ISA which are not split into micro-ops but handled via microcode. the microcode interpreter is totally missing on pure RISCs ( but exists on some not-so pure RISC's like Powe/PowerPC).

    6) Instruction set dictates the memory aligment rules. Architectures with more strict alignment rules can have simples load-store-units.

    7) Instruction set dictatetes the multicore memory ordering rules. This may affect the load-store units, caches and buses.

    8) Some instructions have different bitnesses in different architectures. For example x86 has N x X -> 2N wide multiply operations which most RISC's don't have. So x86 needs bigger/different multiplier than most RISCs.

    9) X87 FPU values are 80-bit wide(truncated to 64-bit when storing/loading). Practically all the other CPU's have maximum of 64-bit wide FPU values (though some versions Power have support for 128-bit FP numbers also)

    1. Re:This is a myth that is not true by drinkypoo · · Score: 0

      Some of what you said is legitimate. Most of it is irrelevant, since it does not speak to the postulate. You're speaking of issues which will affect performance. So what? You'd have a less-performant processor in some cases, and it would be faster in others.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:This is a myth that is not true by hkultala · · Score: 4, Informative

      Some of what you said is legitimate. Most of it is irrelevant, since it does not speak to the postulate. You're speaking of issues which will affect performance. So what? You'd have a less-performant processor in some cases, and it would be faster in others.

      No.

      1) if the codition codes work totally differently, they don't work.

      2) The data paths needed for separate and compined FP and integer regs are so different that it makes absolutely NO sense to have them together in chip that runs x86 ISA, even though it's possible.

      3) If you don't have those x86-compatible address calculation units, you have to break most of memory ops into more micro-ops OR even run them with microcode. Both are slow. And if you have a RISC chip you want to have only the address calculation units you need for your simple base+offset addressing.

      4) In the basic RISC pipeline there are two operands, one output/instruction. There are no data paths for two results, you cannot execute operations with multiple outputs such as x86 muliply which produces 2 values(low and high part of result), unless you do something VERY SLOW.

      6) IF your RISC instruction set says you have aligned memory operations, you design your LSU to have only those, as it makes the LSU's much smaller, simpler and faster. But you need unaligned accesses for x86.

      9) If your FPU calculates with different bit width, it calculates wrongly.

      And

    3. Re:This is a myth that is not true by Bengie · · Score: 1

      9) This is how x86 has been doing it for decades. Does not matter if it's wrong, it's how it's done.

    4. Re: This is a myth that is not true by Anonymous Coward · · Score: 0

      Put simply, the decoder is optimised for the front-end.

    5. Re:This is a myth that is not true by Guy+Harris · · Score: 1

      but then there is the fact that most RISC's use same registers for both FPU and integer

      With the minor exceptions of Alpha, PA-RISC 1.x and 2.0, POWER/PowerPC/Power ISA, MIPS, and SPARC.

  30. Re:so why is intel's 14nm haswell still at 3.5 wat by Anonymous Coward · · Score: 0

    Whaddya mean? The A20 beats the pants off of a Pentium 2....

  31. Intel x86 CISC is converted to RISC via Microcode by VTBlue · · Score: 1

    As mentioned over many years of slashdot posts, x86 as a hardware instruction no longer truly exists and represents a fraction of the overall die space. The real bread and butter of CPU architecture and trade secrets rests in the microcode that is unique in every generation or edition of a processor. Today all intel processors are practically RISC.

  32. Wrong conclusion by edxwelch · · Score: 0

    If you look at the graph "raw average energy normalised" you see that the ARM A9 core has the lowest energy score -> that clearly shows ARM being the most efficient and hence the conclusion is completely wrong.
    Still the test is very interesting. I would like to see it updated with latest CPUs

  33. Microcode switching by DrYak · · Score: 2

    This same myth keeps being repeated by people who don't really understand the details on how processors internally work.

    Actually, YOU are wrong.

    You cannot just change the decoder, the instruction set affect the internals a lot:

    All the reason you list could all be "fixed in software". The fact that silicon designed by Intel handles opcode in a way a little bit better optimized toward being fed from a x86-compatible frontend is just specific optimisation. Simply doing the same stuff with another RISCy back-end, i.e: interpreting the same ISA fed to the front-end, will simply require each x86 ISA being executed as a different set of micro-instructions. (some that are handled as single ALU opcode on Intel's silicon might require a few more instruction, but that's about the different).

    You could switch the frontend and speak a completely different instruction set. Simply if the two ISA are radically different, the result wouldn't be as efficient as a chip designed with that ISA in mind. (You would need a much bigger and less efficient microcode, because of all the reasons you list. They won't STOP intel from making a chip that speaks something else. Intel will simply produce a chip where the front-end is much more clunky, inefficient, waste 3x more opcode per instruction, and waste much time waiting that some bus gets free or copying values around, etc.).

      And to go back to the parent...

    You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names.

    Not only is this possible, but this was INDEED done.

    There was an entire company called "Transmeta" whose business was centered around exactly that:
    Their chip, the "Crusoe" was compatible with x86.
    - But their chip was actually a VLIW chips, with the front-end being 100% pure software. Absolutely as remote from a pure x86 core as possible.
    - The frontend was entirely 100% pure software.

    The advantage touted by Transmeta was that, although their chip was a bit slower and less efficient, it consumed a tiny fraction of the power and was field-upgradeable (in theory just issue a firmware upgrade to support newer instruction.) Transmeta had demos of Crusoe playing back MPEG video on a few watts, whereas Pentium 3 (the then lower-power Intel chip) would consume way much more.

    Saddly, it all happened in an era where pure raw performance was the king, and where use a small nuclear plant to power an Pentium IV (the then high performance flagship) and needing a small lake nearby for cooling was considered perfectly acceptable. So Crusoe didn't see that much success.

    Still, Crusoe was successfully used as a test bed for a few experimental CPU to test their ISA before actual test-bed where available. (If I remember correctly, Crusoe where used to test running x86_64 code before actual Athlon 64 where available for developers), and there were a few experimental proof-of-concept running PowerPC ISA.

    In a way modern way, this isn't that much dissimilar from how Radeon handle compiled shared, except that the front-end is now a piece of software which run inside OpenGL on the main CPU: intermediate instruction a compiled to either VLIW or CGN opcode which are 2 entirely different back-ends.
    (Except that, due to the highly repetitive nature of a shared, instead of decoding instruction on the fly as they come, you optimise it once into opcode, store it into a cache and you're good).

    Again, on a similar way ARM can switch between 2 different types of instruction set (normal and thumb mode), 2 different sets, one back-end.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Microcode switching by hkultala · · Score: 2

      This same myth keeps being repeated by people who don't really understand the details on how processors internally work.

      Actually, YOU are wrong.

      You cannot just change the decoder, the instruction set affect the internals a lot:

      All the reason you list could all be "fixed in software".

      No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.

      The fact that silicon designed by Intel handles opcode in a way a little bit better optimized toward being fed from a x86-compatible frontend is just specific optimisation.

      Opcodes are irrelevant. They are easy to translate. What matters are the differences in the semantics of the instructions.
      X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
      This is semantics of instructions, and they differ between ISA's.

      Simply doing the same stuff with another RISCy back-end, i.e: interpreting the same ISA fed to the front-end, will simply require each x86 ISA being executed as a different set of micro-instructions. (some that are handled as single ALU opcode on Intel's silicon might require a few more instruction, but that's about the different).

      The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.

      You could switch the frontend and speak a completely different instruction set. Simply if the two ISA are radically different, the result wouldn't be as efficient as a chip designed with that ISA in mind. (You would need a much bigger and less efficient microcode, because of all the reasons you list. They won't STOP intel from making a chip that speaks something else.

      Intel did this, they added x86 decoder to their first itanium chips. And. They did not only add the frontend, they added some small pieces to their backend so that it could handle those strange x86 semantic cases nicely.
      But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.

      Not only is this possible, but this was INDEED done.

      There was an entire company called "Transmeta" whose business was centered around exactly that:
      Their chip, the "Crusoe" was compatible with x86.
      - But their chip was actually a VLIW chips, with the front-end being 100% pure software. Absolutely as remote from a pure x86 core as possible.'

      The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like:

      * 80-bit FPU,
      * x86-compatible virtual memory page table format(one very important thing I forgot from my original list couple of posts ago; Memory accesses get VERY SLOW if you have to emulate virtual memory)
      * support for partial register writes(to emulate 8- and 16-bit subregisters like al, ah,ax )

      All these were made to make binary translation from x86 easy and reasonable fast.

    2. Re:Microcode switching by drinkypoo · · Score: 0

      All these were made to make binary translation from x86 easy and reasonable fast.

      And herein lies the proof that you know you are wrong, but are continuing to argue. Those things didn't make x86 translation possible, they made it easy and fast. Which is what I said previously. Thanks for the confirmation.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:Microcode switching by Anonymous Coward · · Score: 0

      I'd really like to know how you make an instruction decoder that provides the x86 strict memory ordering rules on an ARM backend, with buses that do not even support the protocols to support it properly.
      You're not seriously suggesting translating each memory access in a "acquire lock; do memory access; release lock" microcode sequence?
      Because just using only one core of your 8-core CPU would give better performance than that "solution".

    4. Re:Microcode switching by Anonymous Coward · · Score: 0

      so basically, you're arguing that one Turing-complete computer can emulate another, and he's arguing that different instruction sets need different hardware to perform well and it is possible to compare what kind of hardware is needed to get good performance out of a particular instruction set.

  34. Original sources by enriquevagu · · Score: 2

    It is really surprising that neither the linked Extremetech article, nor the slashdot summary cite the original source. This research was presented in HPCA'13 in a paper titled "Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures", by Emily Blem et al, from the University of Wisconsin's Vertical Research group, led by Dr. Karu Sankaralingam. You can find the original conference paper in their website.

    The Extremtech article indicates that there are new results with some additional architectures (MIPS Loongson and AMD processors were not included in the original HPCA paper), so I assume that they have published an extended journal version of this work, which is not yet listed in their website. Please add a comment if you have a link to the new work.

    I do not have any relation with them, but I knew the original HPCA work.

  35. Research performed by Intel.... by Anonymous Coward · · Score: 0

    This is all brought to you by labs run by Intel. Why are were even talking about this? There's nothing here but marketing.

  36. Re:Nobody by __aaclcg7560 · · Score: 1

    A computer instructor in 1992 told my class that computers won't ever need more than 4GB (the 32-bit memory limit). IIRC, 8MB was the norm back then.

  37. The real difference is commodity vs custom by UnknowingFool · · Score: 1

    These days the instruction set matters less than the underlying chip architecture than customization. With this, ARM has an advantage in that their business model allows for higher degree of customization. While some companies can work with Intel or AMD on their designs, for the most part, ARM allows them to change the design as much as they need depending on the licensing.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  38. Soo... by oh_my_080980980 · · Score: 1

    Intel funded research to show their processors are not also rans in the tablet market so manufacturers could feel comfortable using Intel chips in tablets?

  39. Again, what's the problem ? by DrYak · · Score: 3, Interesting

    All the reason you list could all be "fixed in software".

    The quotes around the "software" mean that i refer about the firmware/microcode as a piece of software designed to run on top of the actual execution units of a CPU.

    No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.

    Slow: yes, indeed. But not impossible to do.

    What matters are the differences in the semantics of the instructions.
    X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
    This is semantics of instructions, and they differ between ISA's.

    Yeah, I pretty well know that RISCs don't (all) have flags.
    Now, again, how is that preventing the micro-code swap that dinkypoo refers to (and that was actually done on transmeta's crusoe)?
    You'll just end with a bigger clunkier firmware that for a given front-end instruction from the same ISA, will translate into a big bunch of back-end micro-ops.
    Yup. A RISC's ALU won't update flags. But what's preventing the firmware to dispatch *SEVERAL* micro-ops ? first to do the base operation and then aditionnal instructions to update some register emulating flags?
    Yes, it's slower. But, no that don't make micro-code based change of supported ISA impossible, only not as efficient.

    The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.

    Yes, and please explain how that makes *definitely impossible* to run x86 instruction? and not merely *somewhat slower*?

    Intel did this, they added x86 decoder to their first itanium chips. {...} But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.

    Slow, but still doable and done.

    Now, keep in mind that:
    - Itanium is a VLIW processor. That's an entirely different beast, with an entirely different approach to optimisation, and back during Itanium development the logic was "The compiled will handle the optimising". But back then such magical compiler didn't exist and anyway didn't have the necessary information at compile time (some type of optimisation requires information only available at run time. Hence doable in microcode, not in compiler).
    Given the compilers available back then, VLIW sucks for almost anything except highly repeated task. Thus it was a bit popular for cluster nodes running massively parallel algorithms (and at some point in time VLIW were also popular in Radeon GFX cards). But VLIW sucks for pretty much anything else.
    (Remember that, for example, GCC has auto-vectorisaion and well performing Profile-Guided-Optimisation only since recently).
    So "supporting an alternate x86 instruction on Itanium was slow" has as much to do with "supporting an instruction set on a back-end that's not tailored for the front-end is slow" as it has to do with "Itanic sucks for pretty much everything which isn't a highly optimized kernel-function in HPC".

    But still it proves that runing a different ISA on a completely alien back-end is doable.
    The weirdness of the back-end won't prevent it, only slow it down.

    Luckily, by the time Transmeta Crusoe arrived:
    - knowledge had a bit advance in how to handle VLIW ; crusoe had a back-end better tuned to run CISC ISA

    Then by the time Radeon arrived:
    - compilers had gotten even better ; GPU are used for the same (only) class of task at which VLIW excels.

    The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like {...} All these were made to make binary translation from x86 eas

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  40. CISC instruction sets are now abstractions by Theovon · · Score: 1

    And actually so is RISC to a degree on POWER processors.

    Back in the 80's going RISC was a big deal. It simplified decode logic (which was a more appreciable portion of the circuit area), reduced the number of cycles and logic area necessary to execute an instruction, and was more amenable (by design) to pipelining. But this was back in the days when CISC processors actually directly executed their ISAs.

    Today, CISC processors come with translation front-ends that convert their external ISA into a RISC-like internal representation. It's on-line dynamic binary translation. Now, instructions are broken down into simpler steps that are more amenable to pipelining and out-of-order scheduling. CISC processors don't execute CISC ISAs and therefore don't suffer from their drawbacks.

    It has occurred to me that this could be taken to its logical extreme. ISAs could be made entirely abstract and optimized to be used that way, along with optimizing them for reasonably efficient translation. You get the benefits of microops and the benefits of a CISC ISA (more compact code). Abstract ISAs make it easier to extend functionality in a backward-compatible way too. And unlike x86, we can shed some of the deadweight and also go to all 3-operand instructions, which have some benefits. Decoupling the ISA from the execution engine, we could get even more performance and energy efficiency than Intel does.

    With a processor like Haswell, the logic area dedicated to translation is very small, which is why it doesn't matter much. On the other hand, with something like Atom, it occupies a more substantial portion of the total, making the translation (basically, elaborate decode logic) a buden on die area and therefore power consumption.

    So it's not really appropriate to say it doesn't matter. It MOSTLY doesn't matter, because most of the drawbacks of CISC have been overcome. The fact that we're using an out-dated CISC ISA for x86, however, has drawbacks of having to support rare and excessively complex instructions, a plethora of addressing modes, and only having two operands per instruction.

  41. WoW - I haven't heard those terms in ages... apk by Anonymous Coward · · Score: 0

    Especially "IPL" (What my Ma to this very day uses for "BootStrapping" in fact, & she was a computer operator all thru the "dim days of yore" on IBM stuff, using PL1/2 as languages to code for it, punchcard systems & all too, for 22++ yrs. as a civil servant in that role (funniest part is, @ the DMV the other day when I was with her driving her there so she could take care of renewals etc., their systems went "down" & she was like "How long does it take them to do the SIMPLE THING & do an IPL?", lol)...

    * :)

    (You're "dating yourself" I *think*... OR, you're a mainframe/midrange man, right?)

    APK

    P.S.=> I am leaning towards You're an "oldster" though (& not a damn thing wrong with that @ all - you're more respectable in my book for that)... apk

  42. Re:so why is intel's 14nm haswell still at 3.5 wat by danbob999 · · Score: 1

    Unless you actually make use of 64-bit arithmetics, 32-bit CPUs will always be more power efficient than 64-bit. The ARMv8 has many other improvements over ARMv7 than just 32 more bits.

  43. Re:Research Shows RISC vs. CISC Doesn't Matter by Anonymous Coward · · Score: 0

    The reason why x86 still team up so well against ARM is that modern x86 employ pretty sophisticated run-time hardware optimization techniques, while ARM only has pretty basic and simple optimization techniques - ARM CPUs require much more than x86 CPUs sophisticated compiler optimizations, and even current day optimizing compilers aren't good enough to catch with what Intel CPUs do at runtime, especially since x86 backend optimization usually is more mature than ARM backend optimization. Granted, the more sophisticated x86 run-time hardware optimizations require both power and die surface.

  44. Re:Nobody by Anonymous Coward · · Score: 0

    A computer instructor in 1992 told my class that computers won't ever need more than 4GB (the 32-bit memory limit). IIRC, 8MB was the norm back then.

    ...and he was right..at least from his point of view back in 1992.

  45. What? by Anonymous Coward · · Score: 0

    Like the first guy that posted said: "i've read the legacy x86 instructions were virtualized in the CPU a long time ago and modern intel processors are effectively RISC that translate to x86 in the CPU"

    Intel CPU's haven't used CISC in a long time, like 486 era long time. They translate the old CISC instructions into something more like RISC. So... What is Intel's point with this?

    Back in the day, RISC actually made a huge impact and was able to show it's might against CISC CPU's in a lot of areas, but that time has long since past. Like, almost 20 years past. Pure RISC CPU's still save real estate, but that doesn't matter as much as it did back in the 80's and 90's when CPU die area was at a premium.

  46. Multiprocessing made the difference by unixisc · · Score: 1

    i've read the legacy x86 instructions were virtualized in the CPU a long time ago and modern intel processors are effectively RISC that translate to x86 in the CPU

    Actually, the biggest change in CPUs was not so much Intel adapting RISC techniques in post Pentium CPUs, but rather, multiprocessing, and therefore, the Core platform taking hold

    Remember, one of the things that RISC did better was the multiprocessing support for those who needed it. There were Pentium based multiprocessing systems too from companies like Sequent, but those at the time ran Unix, so the competition was really b/w the likes of Sequent, vs the Suns, HPs, SGIs, and so on. All low volume, and Intel enhancing multiprocessing capabilities of its CPUs would do nothing for its PC platform.

    What changed that was when Microsoft decided to merge the win32 code bases and offer XP as their merged OS for both desktops and servers, it opened the window of opportunity for Intel and AMD. Since NT, in addition to supporting RISC CPUs like Alpha or MIPS, also supported SMP, Intel could take advantage of that fact and thrown in more cores at a platform, and Windows i.e. now NT, would be capable of handling it. That couldn't have worked w/ Windows 95-ME, but once NT took over the desktop, it could.

    Once this happened, the RISC vs CISC game was over. RISC previously had a performance advantage running its own native software over Pentiums running Wintel software. The struggle to beat Intel in running Wintel software was lost first by MIPS, and then by Alpha. Once Intel could throw more cores at the problem w/o costing more than a SPARC or a Power, it was over. Intel being several generations ahead of Cypress, Ross, Fujitsu and even IBM could easily toss in 4-8 cores and still be cheaper than a SPARC CPU, not to mention the off the shelf motherboards and other peripheral logic. Once that happened, it became more cost effective to use Xeons to run Linux or FBSD than it was to run Solaris or AIX or even HP/UX.

    Even in the case of the Itanium, discussed later in this thread, the initial Itaniums were just meant to be uniprocessor CPUs w/ several instructions concatenated together. Today, even Itaniums are multi-core - which solves the compatibility issue b/w generations, but then again throws into question why the Itanium would be needed in the first place, if one can just toss N number of, say, Atoms, and solve the problem.

    Intel's process and manufacturing advantages helped, no doubt, but the big difference was multiprocessing becoming mainstream on the desktop due to the NT architecture replacing the Windows 95 architecture in Microsoft's desktop CPUs

  47. Re:so why is intel's 14nm haswell still at 3.5 wat by lkcl · · Score: 1

    Here is your answer, the A20 is freakishly slow compared to anything Intel would put their name on.

    Granted, you can build a tablet to do specific tasks (like decoding video codecs) around a really slow processor and some special-purpose DSPs. But perhaps the companies in that business aren't making enough profit to interest Intel.

    interestingly that assumption - that allwinner is not making enough profit - is completely wrong. allwinner is now one of _the_ dominant tablet SoC manufacturers in the world. their first revision (the A10, which was a Cortex A8) actually caused a major recession in the electronics industry when it first came out, as it was only $7.50 compared to the nearest competitor at around $11 to $12. everyone *not* using the A10 at the time was left holding worthless components; contracts for supply were reneged on; the change was so quick that many factories and design houses simply went out of business.

    the volumes that allwinner are shipping are simply enormous, and, along with rockchip, their nearest competitor, the tablet market is completely and utterly overwhelmingly dominated by processors of the type that you describe as "built to do specific tasks".

    those "specific tasks" include "running the android OS at a pace that's good enough for the overwhelming majority of end-users".

    in short, intel has a long *long* way to go before they can even remotely consider that they have a processor that can be taken seriously in this very large market, both in terms of price and also in terms of performance.

    what is particularly interesting about the comment that you make is that it would seem that intel really does, just as you do, believe that "a really slow processor and some special-purpose DSPs" simply is... not enough. and, contrary to that belief, it can be quite clearly seen by the total dominance of allwinner and rockchip that "a really slow processor and some special-purpose DSPs" really *is* enough.

    one of the reasons for that is because if you look at the market you find that you need:

    * audio and video CODEC processing. this can be handled by a special-purpose DSP. some of these are now handling 3D 4096-bit-wide screens.

    * 3D graphics. these are handled by licensing a whole range of hard macros (special-purpose DSPs) that come with proprietary libraries implementing OpenGL ES 2.0. they're good enough, and some of them are getting _really_ good.

    * an (as you put it) "really slow processor" - although if you look at allwinner's latest processor the A80 it can hardly be called "slow", it's an 8 core monster - which covers the running of the general OS.

    overall these processors are graded according to price: $5 will get you something dreadful but "good enough", $20 will get you something that's complete overkill for a tablet.

    and you know what? the $7 1.2ghz dual-core ARM Cortex A7 Allwinner A20 is, when it's put with 2gb of RAM, actually extremely quick. i tested out 1gb of RAM running debian GNU/Linux: i fired up xrdp and i had *five* rdesktop sessions running OpenOffice and Firefox on it, onto my laptop. it didn't fall over, and it wasn't dreadfully slow.

    so i think you, just like intel, are completely and entirely missing the point. and in intel's case, that means entirely missing out on a *huge* market segment.

  48. Re:so why is intel's 14nm haswell still at 3.5 wat by lkcl · · Score: 1

    You seem to be conveniently ignoring Intel's Atom and Quark lines. They're all x86 and none of them has a TDP larger than 3w.

    i'm not. intel's quark line - the one i saw announced on here last year - tops out at 400mhz. it has... nothing in the way of interfaces that can be taken seriously. it doesn't even have RGB/TTL video out. however if you are right about the latest intel atom being 3w, then now i am interested! so i am very grateful for you pointing this out, i will go check.