Slashdot Mirror


Intel's RISC-y Business

Esther Schindler writes "With the Xeon 7600 line, Intel is finally using the 'R' word: RISC. With the new chips, Intel is targeting the mission-critical market dominated by Sun SPARC and IBM Power, a first. Can the Xeon E7 processor deliver Intel's final blow to the RISC market, which includes its own Itanium? 'With the launch of the E7 earlier this year, it seemed Intel was finally ready to make its final push, calling out RISC by name. "The days of IT organizations being forced to deploy expensive, closed RISC architectures for mission-critical applications are nearing an end," said Kirk Skaugen, vice president and general manager of Intel's Data Center Group, in a statement announcing the E7 line. Bold words.' Andy Patrizio interviews several experts; what do you think?"

225 comments

  1. finally??? by nurb432 · · Score: 2

    What the hell was the i960 then? Meatloaf?

    --
    ---- Booth was a patriot ----
    1. Re:finally??? by the+linux+geek · · Score: 3, Insightful

      A non-entity outside a few X terminals and RAID controllers.

    2. Re:finally??? by Anonymous Coward · · Score: 2, Funny

      What the hell was the i960 then? Meatloaf?

      Oh hell no. He'll do anything for love, but being an AMD aficiando, he won't do that.

    3. Re:finally??? by kungfuj35u5 · · Score: 1

      Heh slow raid controllers at that

    4. Re:finally??? by crankyspice · · Score: 4, Informative

      Intel has had several RISC chips on the market at various times; the i960, the i860, even ARM designs (XScale).

      TFA doesn't say Intel is going to be bringing out RISC technology, though, just that it's "taking aim" at markets that are still RISC strongholds:

      With the launch of the E7 earlier this year, it seemed Intel was finally ready to make its final push, calling out RISC by name. “The days of IT organizations being forced to deploy expensive, closed RISC architectures for mission-critical applications are nearing an end,” said Kirk Skaugen, vice president and general manager of Intel's Data Center Group, in a statement announcing the E7 line.

      Bold words. Can the E7 really dethrone UltraSparc/Power/PA-RISC and, of course, Intel's own Itanium processors? Intel thinks so.

      --
      geek. lawyer.
    5. Re:finally??? by Jah-Wren+Ryel · · Score: 1

      TFA doesn't say Intel is going to be bringing out RISC technology, though, just that it's "taking aim" at markets that are still RISC strongholds:

      Yeah, this is more about adding fault tolerance features than it is about anything that would qualify as RISC.

      --
      When information is power, privacy is freedom.
    6. Re:finally??? by NJRoadfan · · Score: 2

      It also powered the HP LaserJet 4 series of printers.

    7. Re:finally??? by Anonymous Coward · · Score: 0

      Meatloaf? No. Chopped Liver? Yes.

    8. Re:finally??? by Anonymous Coward · · Score: 0, Interesting

      ... closed RISC architectures ???

      What the effing bull crap is that? RISC is an open standard. I am reminded of the "dim wits at the top of the Corps" /. article from a few days ago.

    9. Re:finally??? by Anonymous Coward · · Score: 0

      Didn't spend much time using supercomputers in the 90s eh?

      Intel Paragon used the i960 in a massively parallel architecture.

      The coolest thing about it was the led bar graphs on each processor node. Vertical was cpu load, horizontal was inter-node communication. You could instantly see crappy code. better yet, you could make designs and such on the front panel by increasing/decreasing comm/processor. Good times.

    10. Re:finally??? by Anonymous Coward · · Score: 0

      What the hell are you talking about? RISC isn't a "standard" of any sort. It's more like an architectural style, used by many very different companies (MIPS, Sun, etc.)

    11. Re:finally??? by Jeremy+Erwin · · Score: 1

      I thought the Paragon used the i860-- a different, later chip.

  2. RISC was never an architecture by Osgeld · · Score: 0

    It is a design philosophy, you the chip makers are the ones who made it expensive and closed (even though its not our day to day CPU's have been "RISC" for a while now) .

  3. RISC? by Anonymous Coward · · Score: 0

    I don't believe itanium is considered RISC, unless you're selectively redefining RISC to mean anything that's not x86/x86-64.

    1. Re:RISC? by the+linux+geek · · Score: 0

      How does Itanium not qualify? It's a load-store ISA with fixed-length instructions. Is that not the normal definition of RISC?

    2. Re:RISC? by Relic+of+the+Future · · Score: 1

      Because it's EPIC. I guess you could argue whether having multiple fixed-length instructions is "different enough" to justify calling it something different, but Intel's marketers (and at least some of their engineers) thought so.

      --
      Those who fail to understand communication protocols, are doomed to repeat them over port 80.
    3. Re:RISC? by Panaflex · · Score: 2

      It's definitely a RISC processor set... the problem with the Itanium was the EPIC instruction set. A complete waste of time, as the compiler is asked to generalize decisions about the thread and multi-core state of the machine during program compilation.

      I mean... who the hell thought that was a good idea? It makes for a nice benchmark, but a terrible architecture. Bring us back the Alpha chip... make it a 64 core monster.

      --
      I said no... but I missed and it came out yes.
    4. Re:RISC? by the+linux+geek · · Score: 1

      I'd say it qualifies as both RISC and EPIC/VLIW. It fits both categories. They aren't mutually exclusive.

    5. Re:RISC? by i.am.delf · · Score: 1

      My god could you imagine the heat dissipation of a 64 core alpha processor. I had a desktop with an EV7 in it. That thing was a space heater. I just looked it up. The spec was 125W for that thing.

    6. Re:RISC? by mevets · · Score: 1

      not to defend itanium, but by not foisting it on the compiler, you foist it onto an interpreter running on the CPU. Although the interpreter was wasteful enough, it had no opportunity to usefully work around the kind of dependence shown by:
          mov xyz, %eax
          add %eax, %ebx
          sub %ebx, %ecx
          or %rcx, %edx
      It could only insert bubbles until the each op finished.
        That was the crazy solution to the CPU:Memory speed imbalance. Multi core has won the day, but modern high speed processing (eg. GPUs) often use this architecture.

    7. Re:RISC? by Waffle+Iron · · Score: 1

      Why does it need bubbles? Can't an X86 keep its other ALUs busy simultaneously doing other instructions nearby that sequence using standard register renaming and opcode reordering techniques?

      At any rate, from what I've read it's the branch prediction that really bottlenecks performance with today's deep pipelines. The advanced runtime branch prediction in the latest CPUs (which can see and react to the actual data at hand) just plain outperforms static compile-time branch analysis.

    8. Re:RISC? by Anarke_Incarnate · · Score: 2

      125W is a gaming CPU nowadays.

    9. Re:RISC? by mevets · · Score: 1

      The mini example was a set of interlocked instructions, where the source operand of each is dependent upon the previous insn; thus everything is forced to be in-order. Compilers are smart enough not to do this, and the real difference in a 'wide' architecture is that it doesn't insert an interpreter (renaming, stalling, bubbling, etc..). The program ( compiler ) has to know that copying R1 to R2 has an N instruction latency before R2 is valid. If it tries to use R2 earlier, it gets junk.

      The x86 trend, since Prescott, has been shorter pipelines + more cores to break the bottleneck.

    10. Re:RISC? by 0123456 · · Score: 1

      125W is a gaming CPU nowadays.

      An i5-2500 at stock speeds takes about 60W at full load.

      But yeah, if you buy AMD then all bets are off.

    11. Re:RISC? by Chris+Burke · · Score: 1

      The mini example was a set of interlocked instructions, where the source operand of each is dependent upon the previous insn; thus everything is forced to be in-order. Compilers are smart enough not to do this, and the real difference in a 'wide' architecture is that it doesn't insert an interpreter (renaming, stalling, bubbling, etc..). The program ( compiler ) has to know that copying R1 to R2 has an N instruction latency before R2 is valid. If it tries to use R2 earlier, it gets junk.

      Yes, that sequence of instructions would have to executed sequentially whether for EPIC, Power, or x86, and compilers for any architecture know that they need to expose the maximum amount of ILP to the processor.

      However only compilers for EPIC need to know the latency of every operation, the number of each type of functional unit, and any slot restrictions that may apply, so that the VLIW instructions can be assembled optimally. Because only by doing so can the ILP be exploited. Otherwise, like in the example given, bubbles will occur.

      With a re-namer and out-of-order scheduler, as much ILP as the compiler can expose in however many instructions will fit in the processor's window can be exploited, automatically scheduled according to the availability and latency of each functional unit on that particular machine.

      The upshot is that the EPIC compiler has to do a lot more work to reach the same level of realised ILP as a non-EPIC compiler. It also has to know much more about the intimate details of the specific CPU being targeted. Meaning the binary will be distinctly sub-optimal for any other CPUs -- as opposed to marginally sub-optimal in the case of non-EPIC compilers. For the example given, if there were earlier or subsequent instructions visible in the window that were independent, then there may not be any bubbles at all.

      Those things which were supposed to make the compiler-centric world of EPIC better than other compilers and OoO schedulers, like branch predication which was one of the major touted features of the ISA, ended up not being worth much. Intel's own research showed that this feature was a modest positive gain on finely hand-tuned code, neutral with a very good compiler, and negative with a 'typical' compiler.

      Having the compiler have to manually do the work of an OoO scheduler in order to avoid bubbles is not a feature. But I mean it almost sounds like you think stalls are only a consequence of the 'interpreter', and don't occur on an EPIC machine.

      --

      The enemies of Democracy are
    12. Re:RISC? by unixisc · · Score: 1

      EPIC is something b/w VLIW and RISC. RISC does all the dynamic analysis (branch predictions, speculative executions) in hardware, VLIW does it all in software (aided by the ultimate compiler), but EPIC is somewhere in between. In EPIC, a number of techniques are implemented @ chip level to get around the shortcomings of VLIW. Particularly, register-renaming and rotating register files are RISC features: in VLIW, a compiler would eliminate the need for register renaming

      On paper, VLIW is mutually exclusive from RISC, given that the former does all possible optimizations @ compiler level, allowing (in theory at least) for the simplest possible architecture. In practice, the dynamic analysis hardware that RISC uses has been found to be only a small fraction of the chip area, thereby virtually eliminating the VLIW advantage.

    13. Re:RISC? by Anonymous Coward · · Score: 0

      not to defend itanium, but by not foisting it on the compiler, you foist it onto an interpreter running on the CPU. Although the interpreter was wasteful enough, it had no opportunity to usefully work around the kind of dependence shown by:

          mov xyz, %eax
          add %eax, %ebx
          sub %ebx, %ecx
          or %rcx, %edx

      It could only insert bubbles until the each op finished.

        That was the crazy solution to the CPU:Memory speed imbalance. Multi core has won the day, but modern high speed processing (eg. GPUs) often use this architecture.

      That sequence is a problem for everything, including Itanium, because there is no way to break a simple dependency chain like that. Not in the compiler, not in hardware, not in an "interpreter running on the CPU" (whatever that's supposed to mean).

      The only real difference between Itanium and anything else is that the Itanium ISA has a number of features intended to help the compiler pass dependency information to the hardware in an explicit fashion, hence the acronym EPIC (Explicitly Parallel Instruction Computer). For example, Itanium includes tag bits in instructions which let the compiler identify the starting and ending point for a group of instructions which have no internal dependencies (they may, however, depend on previous groups). Also, instructions are packed into 128 bit "bundle" words, 3 instructions per bundle, and the instruction slots inside each bundle are semi-VLIW in that not all combinations of 3 instructions are legal. (This actually leads to lots of no-ops packed into bundles, because the compiler can't always find something to fit in. In other words, Itanium doesn't get rid of bubbles, as you imply, it just makes them explicit.)

      As Waffle Iron down below points out, if an out-of-order x86 like a Core i7 is given a sequence like your example, it'll just continue fetching later instructions when it runs into a stall, and execute those while it's waiting for the dependency chain to resolve. This does not mean the CPU is using an "interpreter", as you claim. That's one of the most bizarre misuses of terminology I've seen in quite some time. An interpreter is a software layer, not hardware designed to execute the native instruction set. Hardware out-of-order execution engines have been around since the 1960s. Look up Tomasulo's algorithm, and the IBM 360.

      But actually, you don't even need out-of-order to execute the trivial sequence you gave above without bubbles. You just need a result forwarding network, another basic technique that's been around for a very very long time.

    14. Re:RISC? by f8l_0e · · Score: 1

      To be fair, that EV7 you had was fabricated in either 180 or 130 nanometer process. Made on 32nm process, it would be a whole different story.

  4. CISC by Anonymous Coward · · Score: 0

    "expensive, closed RISC architectures"

    Is the E7 less expensive or closed, or just less RISC?

    1. Re:CISC by ThePhilips · · Score: 1

      Or more to the point: why organizations are picking RISCs at all?

      Either Intel or author of RTFA is missing the point. Most organizations use RISC based systems which come as part of the business critical solutions. Hardware rarely accounts for 10% of the deal. Software licenses, deployment, testing and long term support are where the real money are.

      Unless Intel introduces an architecture which it commits to support for at least one decade, I do not see a thing changing on corporate landscape. The problem with Intel boxes is that by the time you need a replacement part, the CPU/etc generation have already changed and one needs to replace the whole box. That obviously leads to the problem that you can't install the same tested old version of the OS and of the 3rd party crap - meaning that the whole solution has to be tested from ground up. It is not uncommon for such complete tests be worth more than 1000 person/days. Suddenly, replacement of a single $4000 server becomes a magnitudes more expensive affair.

      P.S. But needless to mention that at least some part of the RISC stronghold was already dismantled: DB hosting for which now more and more Linux/x64 is used.

      --
      All hope abandon ye who enter here.
  5. Itaniums is **NOT** RISC by Anonymous Coward · · Score: 2, Interesting

    Just have to point out, Itanium is absolutely NOT RISC in any sense of the word. Other than that, it is rather unfortunate that Intel has the most money to develop new processes (i.e. die shrinks), because the actual Intel instruction set is quite inelegant, both from a programmer standpoint, and from the standpoint of implementing it in silicon. I can't argue with overall performance, if Intel tops performance than that is that; but, the fact of the matter is that any of these RISC designs (Power, Sparc, the PA-RISC, Alpha, ARM...) would clean Intel's clock if they had access to the type of processes Intel does.

    1. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      I was thinking the same thing. I recall Itaniaum was VLIW (http://en.wikipedia.org/wiki/Very_long_instruction_word), where the onus is on the compiler to create massive single instructions that do a great deal in a single clock tick. While, from the compiler's standpoint, there are RISC like issues to contend with-- it is otherwise very much the opposite of RISC.

    2. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 2

      the actual Intel instruction set is quite inelegant, both from a programmer standpoint

      I've always been curious about this kind of statement. I hear it a lot. While I understand the complexities of silicon implementation (finding instruction lengths and decode are a PITA), I've always thought the ISA itself was rather elegant. Yes, there is cruft that could be dropped and AMD did some of that with X86-64, but overall, the day-to-day instruction set is mostly orthogonal and has a fairly regular encoding. GPR shifts, MUL and DIV are a bit quirky and the lack of a packed 64-bit integer multiply is an almost unforgivable sin, but overall, I rather like it.

      What are the things you would like to see changed? We need specifics to have an interesting discussion. :)

      --

    3. Re:Itaniums is **NOT** RISC by the+linux+geek · · Score: 1

      The SSE extensions are ugly, if you're including that in the category of x86.

      Lack of FMA support..

      Relatively starved for registers, although since it's not a load/store arch (another issue, imho) that matters less than it does in, say, ARM.

      There are also implementation issues (lack of a directory cache makes scalability suck), but architecturally, it's a pretty standard and slightly boring CISC. I don't quite understand all the hate it gets - it does tend to be slower than Power or z, and doesn't scale well, but the problems are implementation problems, not architectural ones.

    4. Re:Itaniums is **NOT** RISC by the+linux+geek · · Score: 1

      Variable-length instructions are also kind of annoying. (Yes, replying to myself is bad form)

    5. Re:Itaniums is **NOT** RISC by FrankSchwab · · Score: 1

      Why?

      When transistors were expensive, fixed-length instructions made some sense on die (although they tend to inflate system memory needs), but transistors are extraordinarily cheap today. Instruction decode is such a small part of a modern processor die, and so fast, that it makes no difference.

      Sure, the world would be aesthetically more appealing if the 68000 had won the microprocessor war rather than the 8086, but the performance difference at this stage of evolution would be infinitesimal.

      --
      And the worms ate into his brain.
    6. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      Agreed on SSE extensions, FMA support was originally included in AVX, but has since been removed, don't ask me why.

      Register starvation was a real problem on 32 bit version, generated code was spilling to and reloading from stack slots like mad, even for fairly simple code, and this ate into usable memory bandwidth. 64 bit gives 8 or 9 (in relocatable code) more general purposes registers and is much less prone to the reload hell. Actually 64 bit mode has the same number of GPR as Z and ARM (actually more since PC is separate on x86) . Thankfully the x87 floating point stack nightmare is a thing of the past.

      Implementation problems? Intel has by far the largest budgets and implementation teams in the industry, and also the best processes (although I have a feeling that their lead is shrinking a bit). The real problem is that the architecture is now reaching the limit, after all the latest Core are not that different conceptually from the Pentium Pro, over 15 years old now. Only the P4 was really different, but it was a mistake.

      I believe that the real test for x86 will be when Intel can no more come with a new process shrink every 2 years. This might be around 2018.

    7. Re:Itaniums is **NOT** RISC by JamesP · · Score: 1

      The SSE extensions are ugly, if you're including that in the category of x86.
       

      Why? x87 is definitely ugly, but sse?

      Lack of FMA support..

      Like this? http://en.wikipedia.org/wiki/FMA_instruction_set

      Relatively starved for registers, although since it's not a load/store arch (another issue, imho) that matters less than it does in, say, ARM.

      x86-64 improves on this

      There are also implementation issues (lack of a directory cache makes scalability suck), but architecturally, it's a pretty standard and slightly boring CISC. I don't quite understand all the hate it gets - it does tend to be slower than Power or z, and doesn't scale well, but the problems are implementation problems, not architectural ones.

      Problem is Intel has a lot of money. So even if Power or Alpha is 'better', Intel has the money to make it better (in general) than the competition (see Apple dropping the PPC because IBM couldn't make a mobile G5, amongst other things)

      --
      how long until /. fixes commenting on Chrome?
    8. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      The SSE extensions are ugly, if you're including that in the category of x86.

      SSE2 in the minimum requirement for that AMD64 thingy. FMA is coming both from AMD and Intel. The inclusion of directory cache is a processor architecture level decision which both of the x86 players are making or have made a while ago.

    9. Re:Itaniums is **NOT** RISC by loufoque · · Score: 1

      I manage a high-performance library that contains, among others, a SIMD abstraction layer, not unlike Framewave or Accelerate (but better, of course ;))
      The SSE/AVX variants are clearly the most annoying to support, and are not really orthogonal at all.
      The PowerPC and NEON variants have much more straightforward implementations.

    10. Re:Itaniums is **NOT** RISC by the+linux+geek · · Score: 1

      Did you read your own Wikipedia article? FMA isn't in any shipping Intel x86 CPU.

    11. Re:Itaniums is **NOT** RISC by hedwards · · Score: 3, Insightful

      As far as x86-64 goes, isn't that mainly because AMD trotted out a 64bit processor that was backwards compatible with 32bit programs and whomped Intel's 64bit processors which required specially compiled programs to work with?

    12. Re:Itaniums is **NOT** RISC by Anarke_Incarnate · · Score: 1

      Yes. Intel wanted the MERCED to trickle down and replace the aging x86. They STILL refuse to call it AMD64, which is what AMD calls the architecture (This caused confusion at my job, because people assumed AMD64 was only for AMD CPUs and the servers they were downloading code for were intel based). Intel instead calls their version EM64T, which is based on, but a lesser variant of, AMD64.

    13. Re:Itaniums is **NOT** RISC by ultranova · · Score: 1

      Relatively starved for registers, although since it's not a load/store arch (another issue, imho) that matters less than it does in, say, ARM.

      One might argue that the whole concept of (general) registers is an ugly hack to get around limited or nonexistent cache controllers in old processors. It certainly isn't "elegant" by any stretch of imagination to divide general storage into two separate namespaces, and it also wastes memory with what are basically explicit cache control commands (load/store).

      Also, don't forget that the more registers you have, the more state the OS has to save and restore at task switch time.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    14. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      It does make a difference, particularly for energy efficiency. This is why Intel's newer processors contain a significant complexity in auop cache, to bypass x86 decoders. In lower end devices, "x86 tax" is said to be anywhere from 10-20%, which is huge.

      Variable length instructions are actually good, to improve code density, but they should be done in a sane way. Thumb 2 is a reasonable example.

    15. Re:Itaniums is **NOT** RISC by FreonTrip · · Score: 2

      I think - in a colossal effort to refuse to acknowledge that they're eating their competitor's dog food - Intel changed from the awkward and ungainly EM64T to Intel 64 for nomenclature. The only differences between the two amount to a tiny number of instructions AMD deprecated, then inexplicably brought back after Intel had implemented the rest.

    16. Re:Itaniums is **NOT** RISC by Darinbob · · Score: 5, Informative

      The x86 architecture is horribly unorthogonal. Each register in the basic set has it's own special purpose which are required by some instruction or other, thus no register is general purpose. The instruction set is clearly CISC with variable instruction size, multiple ways to do the same operation, etc. So many instructions operate directly on memory instead of being a load-store architecture with a lot of registers. It was designed to not take up a lot of program space as opposed to being efficient to decode and execute. It's really not that elegant compared to even other CISC chips of it's era (68000 for example).

      Ie, you've got the EAX "accumulator", EBX base register, ECX counter register, EDX for division, SI source index, DI destination index, etc. The closest to a general purpose data register is EAX, and EBX is sort of like a general purpose address register, but there aren't any pure general purpose registers that can be used for anything. And so your programs tend to spend a lot of time shuffling stuff into the register that's needed or using a memory location directly as an operand.

      But that make sense since the x86 instruction set was more an evolution than a design. Start with 4004 (first microprocessor), go to 4040, 8008, 8080, 8085, then finally 8086. Along the way every new CPU was vaguely compatible (either very similar instructions, or you could write a program to convert existing code to the new CPU). Along that evolution the instruction set grew. It was important in the 8080 era to save program space since RAM was expensive. Without a cache it meant that instruction fetching was just as expensive as fetching a memory operand. The more complex instruction sets meant that most CPUs along this line were microcoded, but the performance hit from that wasn't so big since most of these early chips weren't meant to be speed demons but were for low cost designs (low cost relative to the big computers anyway). Microcode meant you could add a new instruction easily without a lot of design overhead.

      The snag is that along the way RAM got cheaper and the need for performance become the key feature. But Intel adapted because in the Pentium and later these chips really are RISC under the hood. They convert the x86 instructions on the fly into a something that's a step up from microcode which are much more suitable for a pipelined or superscalar architecture. So basically everyone uses RISC these days, it would be foolish not to. But Intel is a prisoner of it's own design. It can't change the instruction set without breaking compatibility. Every time it has a better architecture it's a flop because that's not PC compatible and they're competing with others for the same product space.

    17. Re:Itaniums is **NOT** RISC by Darinbob · · Score: 1

      There are new processors without cache too. RISC isn't just for high end systems. Most of the lower power chips for embedded market are RISC based, and this includes a wide variety of ARM CPUs. Even when you do have a cache you are often at the range of power where you don't want a very complicated instruction decoder because you're not building a top of the line PC. The point of RISC is to keep the entire machine design simple and straight forward and uniform, not just instruction decoding; the more space you save the more you can use for something that really does help your performance (bigger cache, more ALUs).

    18. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      I think it has something to do with the ugly warts that the entire line inherited from the original 8086/8088 days. Weird segmented address space (that are supposedly claimed as unused, yet are kept for backwards compat), I/O instructions vs. MMIO, a weird "Priviledge Ring Arch" vs. simpler two-mode implementations, call by interrupt vs. call by sub (WTF?), even more address space warts once in 32-bit land (PAE, anyone?)...the list just keeps going on. Hell, even the assembler taught in college (way back when) read weird - compared to everyone else (all left-to-right), Intel was bass-ackward (left-to-middle, right-to-middle) and moving between non-x86 code and x86 code was a headache.

      The 68k family had it right, but Motorola (as usual) completely screwed the marketing pooch. 8 data registers, 8 address registers, completely flat address space (no %@#$*& segment math to contend with), memory mapped I/O was native, and a set of addressing modes that was fairly orthogonal for most instructions. It wasn't perfect - address registers couldn't do some forms of arithmetic because they were meant to be used as registers for pointers (this was after all back when C was king), and some other small quirks (status register in the first generation was a non-prived access, but starting with the 68010 became a prived instruction for the supervisor mode), but nothing like the hideous warts that x86 came with. Using 68k assembly was a pure joy, easy to read, and easy to understand.

      Frankly, it was the "Cadillac of CISC". It made CISC sexy all over again when RISC was the bleeding edge.

      In hindsight, IBM would have done the world a great favor had they not chose to go with Intel's "cheaper" chip. At the time, Moto's 68k was underconsideration, but lost due to cost. We would have had a 16Mb address space in the first generation PCs instead of the brain-dead 1Mb with segments.

    19. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      Why?

      When transistors were expensive, fixed-length instructions made some sense on die (although they tend to inflate system memory needs), but transistors are extraordinarily cheap today. Instruction decode is such a small part of a modern processor die, and so fast, that it makes no difference.

      I agree that variable length instructions are not bad in and of themselves. At one time there was a good argument in favor of fixed length instructions, back in the 1980s and early 1990s when transistors were expensive enough that anything you could do to make a single-chip processor practical was really important. (Not coincidentally, that era was when RISC ideas were at their most influential.) These days, if anything, variable length can be an advantage: if you choose the mapping of instructions to encodings carefully, the most common instructions are the shortest, giving you a form of Huffman encoding, i.e. compression. Which makes the effective size of instruction caches larger, which in turn improves performance.

      However. Possibly the biggest wart x86 has today is that its encodings, which once were tight and efficient, are now somewhat bloated and convoluted to decode. This is a consequence of having to abuse the hell out of prefix bytes in order to extend the ISA in ways it was never originally designed for (who in the 1970s could have predicted that some day something recognizably descended from the 8086 would occupy the role it does today?). This is where more or less all of the true hardware level x86 badness lies today.

      Sure, the world would be aesthetically more appealing if the 68000 had won the microprocessor war rather than the 8086, but the performance difference at this stage of evolution would be infinitesimal.

      68K was also variable length. However it is actually arguable that 68K would have had more problems progressing up the performance curve than x86. The thing about x86 is, it's not all that CISCy of a CISC. The addressing modes are relatively simple, and individual instructions can't cause lots of memory accesses. Not so for 68K, especially after the extensions to the ISA in the 68020. It was an old-school "close the semantic gap" CISC, clearly inspired to some extent by VAX, where instruction designers were trying to mimic high level language features in the ISA. Way cool to code assembly language for, not so cool to implement in hardware.

      x86, on the other hand, is almost RISCy without actually being a RISC... and in the long term that turned out to be about the right balance. You get most of the code density benefits of a CISC without the superb ugliness of trying to make a high performance pipeline which can handle a single instruction generating literally ten or more memory accesses (due to double-or-worse indirection in addressing modes). (This is extremely nasty because when you have a MMU, any of those memory accesses can cause a fault, and you have to design it all to recover correctly from any combination of memory references faulting, which is _way_ simpler to do if any individual instruction can only generate one or two memory references. IIRC x86 can generate at most 2 memory accesses per instruction, and most high performance x86 implementations just crack those into one memory access + ALU op and one load/store internally.)

      The worst thing about 32-bit x86 is the lack of registers, and that was fixed to about the optimal level in x86-64. (There have been papers showing that the optimal number of GPRs for an OOO CPU with renaming is somewhere between 16 and 32, so 16 is a pretty good number... 32 would be better but not a lot better.)

    20. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      Intel should offer an interface for direct access to the RISC core, thus bypassing x86 CISC to RISC translation. I'm sure the Linux world would quickly adapted and build a kernel for it... assuming of course that direct access has a performance, or other tangible, benefit. Then once Linux dominates the world Intel can phase out x86 CISC. They can easily dig themselves out of this hole, but it will take many years to supplant the existing x86 CISC user base.

    21. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      (see Apple dropping the PPC because IBM couldn't make a mobile G5, amongst other things)

      See IBM not giving a rat's ass because Apple was small fries in the PowerPC market. And what Apple wanted wasn't just a mobile G5, it was a high power mobile G5, PPC is/was mobile enough to be a significant player in the embedded space. People just refuse to acknowledge that Apple was insignificant to the PPC market relative to gaming consoles and embedded devices.

    22. Re:Itaniums is **NOT** RISC by crispytwo · · Score: 1

      I was always under the impression that the 68k vs 8086 architecture produced far less heat for the same throughput.

      If that was true then, and is still true, then current processors could be consuming less power under a different architecture and doing the same work. Given that my cell phone's ARM chip is more powerful than my old PC, and heats up far less no matter how much I gab on it might give some credence to the concept.

    23. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      Intel should offer an interface for direct access to the RISC core, thus bypassing x86 CISC to RISC translation. I'm sure the Linux world would quickly adapted and build a kernel for it... assuming of course that direct access has a performance, or other tangible, benefit.

      No, there isn't a benefit to this. Intel's/AMD's RISC architecture is not defined, because it is only used by the translator module they can constantly change the RISC instructions.

      The RISC instructions Intel uses are also super-wide which makes them horribly memory inefficient even compared to other RISC machines (16-32byte wide instructions anyone? Remember that this is FIXED WIDTH so everything from MOV to ADD to MUL is huge). The super-wide design makes the CPU fast internally but is not something that would be pleasant to use, plus it is not backward or forward compatible with different CPU generations.

      It's too bad Intel sold off XScale instead of buying ARM Holdings outright, ARM is a sleek architecture that has the familiar feel of the x86 but is a true RISC so is much smaller, cleaner and more efficient. Add an Altivec style FPU module, multiple cores and a cache coherency protocol and you've got yourself a nice replacement CPU architecture. The problem is always backward compatibility, the ability to smack one of these ARM CPUs into an existing PC via the PCI-E bus (x86 stub loader that boots an ARM OS which runs the native x86 and ARM in parallel and uses the x86 for compatibility) would be able to transition the market away (new systems do the reverse, ARM is the main CPU, optional x86 emulator add-in card, software emulator if you don't get the add-in).

    24. Re:Itaniums is **NOT** RISC by serviscope_minor · · Score: 1

      x87 is definitely ugly

      x86 with it's stack based apparoach is certainly ugly. But (and here's a big but) it works internally at 80 bit for free which was fantastic. With careful coding on could write very effective and accurate single precision floating point code (or get better precision with doubles) essentially at no cost. It also supported loading and saving to memory of long-doubles so one could have hardware assisted super precision floating point numbers if needed.

      That was all very nice, but required some care.

      Of course if you overflowed the stack, it pushed the end into memory, and would usually truncate. This would often be different between -O0 and -O3. It also could it very hard to estimate the real precision of difficult foating point code.

      --
      SJW n. One who posts facts.
    25. Re:Itaniums is **NOT** RISC by serviscope_minor · · Score: 1

      One might argue that the whole concept of (general) registers is an ugly hack to get around limited or nonexistent cache controllers in old processors.

      One might, but one could also argue that they're just another part of the memory heirachy (registers/L1/L2/L3/RAM/disk/stone-tablets). Registers usually require fewer cycles to access than even L1 cache, and can also do several more fast things, like parallel access of the two operands of some opcode and read-modify-write in a single cycle.

      Of course, many processors blur the distinction somewhat.

      I agree with your point about the lack of elegance of separating the namespaces. However, most processors in hardware effectively present two namespaces (or rather two different views on the same namespace) as the stack and no-stack memory. Registers simply add a third to it.

      Another advantage of separating namespaces is that there is no need to make the non-shared parts coherent in multi-CPU systems. Although, taking this too far (e.g. in the Cell) makes life very much harder.

      Anyway, it's fun to debate the philosophy of CPU design :)

      --
      SJW n. One who posts facts.
    26. Re:Itaniums is **NOT** RISC by TheRaven64 · · Score: 1

      They STILL refuse to call it AMD64, which is what AMD calls the architecture

      AMD called it x86-64. People called it AMD64 because IA64 was used for Itanium. AMD64 is misleading, since x86-64 is a relatively small set of tweaks to x86, yet it gives all of the credit (or, perhaps, blame) to AMD. Calling it x86-64 is vendor neutral and descriptive.

      --
      I am TheRaven on Soylent News
    27. Re:Itaniums is **NOT** RISC by Renegrade · · Score: 1

      I'd like to add to your comment that the x86 front end, although hideously ugly compared to say, the 68k mentioned above, acts basically as an instruction compression engine.

      So you have all the advantages of dense CISC-y instructions with a powerful RISC engine under the hood. Memory is still expensive and very small --> is a cache huge? is it cheap? No, and no. CISC-style instructions pack more easily into those tiny spaces, making cache misses less often and less expensive.

      RISC didn't win. CISC didn't win. They both lost out to designs that can leverage the advantages of both.

    28. Re:Itaniums is **NOT** RISC by Targon · · Score: 1

      The thing is, times have changed, and you have to look back at the real-world issues, not just at low level "small" applications. The more complex things become, the less the CISC vs. RISC argument matters, especially when internally, CISC instructions get broken down into RISC-type instructions anyway.

      So, if you are doing something really complex, a well-written application done with CISC instructions won't be any better or worse than if you did the same thing under RISC. It is like the old idea that it is far easier to have a chip with a VERY VERY high clock speed that executes a lot of NOP instructions than one that is actually doing something, and the more complicated applications become, the more you can benefit from CISC(single command to do the job of multiple commands). I am not including all the SSE instructions since they really were put in place by Intel just to try to shut AMD out for the most part.

    29. Re:Itaniums is **NOT** RISC by Renegrade · · Score: 1

      Actually that's 4G address space in the original 68000.

      The address registers were fully populated with 32 bits with the very first 68k. Only 24 address lines were actually connected (er, 23, was something odd with the odd addresses if I recall correctly), or 20 address lines in the 68008. Motorola (and Commodore, but NOT Apple) documentation said not to use the upper 8 bits of the address registers as they would one day be connected to address lines.

      Lo and behold, the 68020 came out, and it had a full 32 address lines. Commodore's 32-bit clean code was validated, and Apple had to rush to fix code where they were using those "extra" bits as flags.

      Also, the 68000, although only possessing a 16-bit-at-a-time ALU and 16 data lines, is effectively a full 32-bit architecture, just a bit pokey. It's lack of 32bit x 32bit = 64bit multiply was pointed out repeatedly by 386 programmers, but by and large, most high level programming languages even today don't support that. (usually they're limited to 32x32=32 or 64x64=64). Since it could do pretty much any 32 (op) 32 = 32 operation, you could write your high level code, and then expect it to be twice as fast on a 68020.

      IBM should have used at least the 68008. It wasn't much bigger than an 8088 (used in the IBM PC and XT), being only a 44-pin DIP (vs 40-pin), and had full 68k functionality. The PC-AT could have then used the full-on 68000 instead of the 80286.

    30. Re:Itaniums is **NOT** RISC by JamesP · · Score: 1

      Yes, I read it, I was just pointing out it's going to be there (hopefully)

      --
      how long until /. fixes commenting on Chrome?
    31. Re:Itaniums is **NOT** RISC by TheLink · · Score: 1

      Variable-length instructions are also kind of annoying.

      Annoying to some, but useful in practice:
      http://en.wikipedia.org/wiki/ARM_architecture#Thumb-2

      --
    32. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      Separated namespaces (registers vs addressable RAM) is probably the most elegant solution to the problem we've got. Registers are good for performance, of course, but increasingly they're important for parallel algorithms. You need some mechanism to say that core 1 of a CPU is working on something different than what core 2 of a CPU is working on and you have to have a mechanism for them to share data only in safe ways. You could have a Cell-like architecture with segmented memory spaces for each processing core, but I think the more elegant solution is to just give each core its own set of registers and provide memory barriers for communication.

    33. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      Uuum, I'm not a chip engineer, but why don't they simply offer direct access to the RISC core in parallel? That way, nobody had to use it. But since it would offer more flexibility and performance to directly program it, people would start using it.
      And before you know it, nobody would care about the old microcode anymore. A few releases later, Intel would remove it... aaand we're done!

      Am I missing something, or are they missing this?

    34. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      Instruction decode is such a small part of a modern processor die, and so fast, that it makes no difference.

      But it is a quite substantial part of the power budget for x86 chips, which is why I stipulated the hardware complexities.

      --

    35. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      This is a consequence of having to abuse the hell out of prefix bytes in order to extend the ISA in ways it was never originally designed for

      It's true that there are lots of prefixes, but if you look at how those prefixes are actually used, there is a great amount of regularity. Almost every SSE/SSE2 instruction uses the same prefix encoding scheme based on whether it is scalar/packed or single/double. SSE3 has regularity across other dimensions. Later SSE ISAs have somewhat less regularity but they were also much smaller extensions.

      There have been papers showing that the optimal number of GPRs for an OOO CPU with renaming is somewhere between 16 and 32

      I remember reading that paper. I didn't buy it then and after almost 10 years in the HPC market I really don't buy it. Many of those limit-type papers have a fundamental flaw: they assume compilers are really, really stupid. I work on compilers that have to go out of the way to not perform certain transformations because they create too much register pressure. Now, isa-wise it gets harder to make more registers available while at the same time keeping text size reasonable but it's absolutely not true that we cannot use more than 32 registers. We can in fact use thousands, in almost every program.

      --

    36. Re:Itaniums is **NOT** RISC by Anonymous Coward · · Score: 0

      I remember reading that paper. I didn't buy it then and after almost 10 years in the HPC market I really don't buy it. Many of those limit-type papers have a fundamental flaw: they assume compilers are really, really stupid. I work on compilers that have to go out of the way to not perform certain transformations because they create too much register pressure. Now, isa-wise it gets harder to make more registers available while at the same time keeping text size reasonable but it's absolutely not true that we cannot use more than 32 registers. We can in fact use thousands, in almost every program.

      I didn't read it in depth but what I remember from it was that it was a knee-of-the-curve result, not something where the paper authors thought there was 0 benefit to >32. As in, 16 registers are maybe 85% good (making numbers up for illustrative purposes here), 24 are maybe 90%, 32 are a bit higher still. Basically, sharply diminishing returns past 16. I find that pretty believable, and as you note, architecting more registers has real costs in program text size -- but more importantly, it has implementation costs too. If you make a physical register file too large it will cease to perform like a register file. (That is, it'll take multiple cycles to access, much like a cache. Or you'll have to reduce the number of access ports, which is also bad.) There's also the concern of needing to save & restore too much state to make a context switch.

      So yeah, if the statement is ">32 has 0 benefit" I don't buy that either, but I find it pretty believable that somewhere between 16 and 32 is about right, taking into account not just the theoretical benefits but also the implementation constraints.

    37. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      The SSE extensions are ugly, if you're including that in the category of x86.

      In what way are they ugly? To me they are "ugly" in the sense that it's not a general vector ISA but that is not what Intel was aiming for initially. Even AVX and the stuff pitched for Larrabee is not a great vector ISA. But SSE is reasonably functional and you can do quite a lot with it. I guess I am looking for specifics to better understand what I'm missing. :)

      Lack of FMA support

      Sure. I could name all sorts of things I would like to see in an ISA. But does that make it ugly, or just incomplete? I think you can have a beautiful ISA that is not complete.

      It does tend to be slower than Power or z

      Really? I have never heard that before and it doesn't line up with my experience. Not saying you're wrong but I would be very interested in reading studies that demonstrate this.

      doesn't scale well

      What do you mean by "scale?" Supercomputers with hundreds of thousands of cores have been built out of x86 chips.

      --

    38. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      I didn't read it in depth but what I remember from it was that it was a knee-of-the-curve result, not something where the paper authors thought there was 0 benefit to >32

      Yes, that's what they argued, but frankly, it's not a valid result when gcc is your compiler. Wall wrote a very interesting paper on how to use 1000 registers. Compilers today don't even have to come close to any of the fancy tricks he talks about to suck up register resources. :)

      but more importantly, it has implementation costs too. If you make a physical register file too large it will cease to perform like a register file.

      We already have register files with hundreds of registers. They are used for O-O-O processing. They simply aren't ISA visible. Yes, there is a hardware limit, but even that is larger than people think it is. [Note: almost-shameless plug!] Techniques like register caching can be very effective, allowing very large register files with essentially the same performance as a small register file. Now, with every other architecture research study, take it with a very large grain of salt. But it is an interesting idea. It seems to me that ISA encoding is really the bigger problem.

      There's also the concern of needing to save & restore too much state to make a context switch.

      Yep, that is a big problem that most people ignore. There certainly is a balance to be struck. In many of the codes I see, 90% of the time is spent in inner loops with no calls, so this isn't generally a problem for those programs. OS effects are usually pretty minimal, but again that's HPC which is certainly quite different from a more general-purpose machine. As with any statement, evaluate it in the context provided. :)

      --

    39. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      the latest Core are not that different conceptually from the Pentium Pro, over 15 years old now.

      I'm not sure what you're getting at here. The x86 ISA has little, really nothing, to do with this. Most of the stuff in Pentium Pro was invented for big iron machines of the '60's and '70's. There's some novel stuff in there but most of it is riffs on a 40-year-old theme. That's true of basically every mainstream general-purpose processor out there.

      I believe that the real test for x86 will be when Intel can no more come with a new process shrink every 2 years. This might be around 2018.

      That's going to affect everyone, not just Intel. I think you're a bit pessimistic with 2018, but it is certainly coming.

      --

    40. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      Can you say more about this? What do you mean by "orthogonal" I certainly agree that SSE/AVX leaves a lot to be desired, but so do Altivec and NEON. None of them is a very good vector ISA. In what ways do you see Altivec and NEON as better designs? I am genuinely curious!

      --

    41. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      Each register in the basic set has it's own special purpose which are required by some instruction or other, thus no register is general purpose.

      I strongly disagree with this. There is a small number of instructions (like 3) that are regularly used that have "special" register operands. Otherwise, the only dedicated registers are rsp and rbp and usually you don't even need rbp and even that is set by the ABI, not the ISA (other than push/pop I suppose). I see codes all the time that use every single GPR other than rsp as a general purpose register.

      --

    42. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      I think it has something to do with the ugly warts that the entire line inherited from the original 8086/8088 days...

      Everything in that paragraph is truly ugly. It is also totally irrelevant today. Either no one uses them or they are gone in x86-64.

      --

    43. Re:Itaniums is **NOT** RISC by Darinbob · · Score: 1

      I think part of the problem is that both 386 expanded things a bit more than 8086 and 80286 did, so it is a bit more uniform. The bigger confusion is that what appears uniform at the programmer level is not uniform at the instruction level. That is, many of these instructions have a "short form" if you use a specific register (ie, ADD an immediate to AX). That's added complexity to the compiler and makes it harder to just use any available register if you also want efficient code.

      Similarly, if you're stuck using a specific register with the DIV instruction that can conflict with a compiler optimizer as well because now there's a fixed use register mucking things up. Even if there are only a few instructions that do this it can have a big impact. (though multiply/divide tend to be the annoying cases even in RISC machines).

    44. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      That is, many of these instructions have a "short form" if you use a specific register (ie, ADD an immediate to AX). That's added complexity to the compiler and makes it harder to just use any available register if you also want efficient code.

      That's not very difficult to handle in a compiler. It's pretty easy to tweak register assignment heuristics to prefer one register over another. Is it worth it? I think the jury's out on that. The text space savings can sometimes make a big difference.

      Similarly, if you're stuck using a specific register with the DIV instruction that can conflict with a compiler optimizer as well because now there's a fixed use register mucking things up.

      Again, this is easily handled in the compiler and I did admit moderately-used instructions like this are a bit ugly. So you'll get no disagreement from me. In the end, though, it doesn't really make code generators any more difficult.

      --

    45. Re:Itaniums is **NOT** RISC by loufoque · · Score: 1

      A lot of instructions on SSE are not really natural element-wise or reduction operations, but often affect only the low/high elements, or the low/high bits. The operations on integers are not consistent: sometimes they're only available for 8-bit, sometimes only for 16-bit or only for 32-bit. 16-bit multiplication is in SSE2 for example, but 32-bit multiplication is only in SSE4.1 and 8-bit and 64-bit multiplication still aren't available.
      Altivec is more consistent: operations on integers are typically available for all integer sizes.

    46. Re:Itaniums is **NOT** RISC by loufoque · · Score: 1

      A recent nonsense I ran into is also _mm256_testz_ps. It's not consistent with _mm256_testz_si256, and doesn't even behave like the Intel documentation says (it only checks the high bit, not the whole value)

    47. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      A lot of instructions on SSE are not really natural element-wise or reduction operations, but often affect only the low/high elements, or the low/high bits.

      To clarify, you're talking about things like HADD and, with AVX, shuffles that only operate within 128-bit clusters? This is certainly driven by implementation challenges. In the old vector machines these were known as "cross-pipe" operations. You basically end up building a crossbar to implement reduction-type operations (pure reductions, compresses, snake shifts, etc.), So while I agree that these types of operations are very useful, they are also very expensive. SSE's lack of reduction-type operations is one of the major reasons I consider it far from a great vector ISA. So we're in agreement here.

      The operations on integers are not consistent: sometimes they're only available for 8-bit, sometimes only for 16-bit or only for 32-bit. 16-bit multiplication is in SSE2 for example, but 32-bit multiplication is only in SSE4.1 and 8-bit and 64-bit multiplication still aren't available.

      To be fair, I did say the lack of 64-bit multiply is an almost unforgivable sin. :) But yes, the integer operations are somewhat lacking. That said, how important are they? I am not a graphics expert but I would think the SSE contains the most important operations for graphics. That's what it was originally designed for, after all. In the HPC/scientific codes realm, anything less than 32-bit integers isn't terribly interesting.

      --

    48. Re:Itaniums is **NOT** RISC by David+Greene · · Score: 1

      Yep, the test/mask instructions are a mess. Intel botched that big time. The two different mask schemes (sign bit and all-1's elements) are strange. I sort of understand why they did it, as an all-1's mask makes it easier to use bitwise operations to simulate predication, but who actually does it that way? The Larrabee proposal cleaned that up somewhat but it still wasn't quite what I'd want to see.

      --

    49. Re:Itaniums is **NOT** RISC by loufoque · · Score: 1

      Some image processing algorithms I've worked with only work with integers because of numerical stability issues with floats (but then, with work, it would probably be possible to adapt them)

    50. Re:Itaniums is **NOT** RISC by loufoque · · Score: 1

      We use blend (or bitwise tricks before that) and vectors full of 0's or 1's for pseudo-branching, not those instructions. The test/mask instructions are only used to return whether a vector contains at least a non-zero element or stuff like that, which is rarely useful.

  6. RISCy Business Cycles by Anonymous Coward · · Score: 1
    • RISC dominates servers and high end workstations
    • CISC takes desktops and makes steady inroads into workstations
    • RISC dominates low power devices
    • CISC takes high end servers
    • RISC makes inroads into notebooks and desktops

    Lather, rinse, repeat, profit? and yawn!

  7. Are they also gonna shut down the gibson? by cb8100 · · Score: 0

    RISC architecture is gonna change everything!

    --
    My lack of God, it's Trotsky!
    1. Re:Are they also gonna shut down the gibson? by crankyspice · · Score: 3, Funny

      RISC architecture is gonna change everything!

      I'm still waiting for the P6 chip. Triple the speed of the Pentium. With a PCI bus, too.

      --
      geek. lawyer.
    2. Re:Are they also gonna shut down the gibson? by Anonymous Coward · · Score: 0

      IBM already released a P7. P6 was released a few years ago :-)

    3. Re:Are they also gonna shut down the gibson? by hedwards · · Score: 2

      What are you up to with all that power? I hope you're not planning to hack a Gibson...

    4. Re:Are they also gonna shut down the gibson? by Anonymous Coward · · Score: 0

      don't forget the million psychedelic colors..

    5. Re:Are they also gonna shut down the gibson? by Anonymous Coward · · Score: 0

      Yeah. RISC is good.

      Now, bring on the teenage Angelina Jolie frontal nudity!!

    6. Re:Are they also gonna shut down the gibson? by Anonymous Coward · · Score: 0

      No, he's planning to hack THE PLANET.

  8. Probably a bullshit story by oldhack · · Score: 1

    The summary stinks of spam with content-free verbiage.

    --
    Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
    1. Re:Probably a bullshit story by the+linux+geek · · Score: 1

      It's just yet another attempt of Intel to make x86 chips take over the high-end server market, as they've been trying to do since the early or mid 90's. x86 is like fusion power in that regard - it's always just a few years from evicting the RISC and mainframe architectures from their niches, no matter when you ask.

    2. Re:Probably a bullshit story by PCM2 · · Score: 1

      it's always just a few years from evicting the RISC and mainframe architectures from their niches, no matter when you ask.

      I think it's pretty damn close to evicting RISC today -- or at least, putting it into a niche, when I'd hardly have called RISC/Unix a "niche market" ten or more years ago. Mainframes are definitely a niche, but where they exist they are well entrenched.

      --
      Breakfast served all day!
  9. VLIW != RISC by gman003 · · Score: 1

    Itanium is not RISC in any sense of the word. It's pretty much the exact opposite of RISC - instead of using small, simple operations, it uses massive, complex instructions, often ones that produce multiple effects (most words produce three logical instructions).

    (Note for the acronym-deficient: RISC == "Reduced Instruction Set Computing", VLIW == "Very Long Instruction Word")

    1. Re:VLIW != RISC by Anonymous Coward · · Score: 0

      Itanium is a lot closer to RISC than you portray. It slashes the total number of instructions by being a load-store architecture, without numerous variants of instructions for all the different addressing modes. It's also a fixed length encoding. As I recall, those are the two key characteristics of RISC, and Itanium has them both. IIRC the registers are not quite general purpose, but pretty close, much like SPARC, which certainly is a RISC architecture. Even the instruction predicates, one of Itanium's least RISCy features, are found on at least one RISC architecture, ARM. And while Itanium is close to a VLIW, it's not one of those, either. The Itanium instruction bundles are only three instructions long, and the composition is very flexible compared to a traditional VLIW.

      The instruction length thing is a red herring for judging a RISC processor. Back when RISC was conceived the comparison was to CISC, and a fixed 32-bit instruction coding was actually quite a bit longer than an average CISC instruction. And now we have variable width RISC as well (THUMB2), although ARM is one of the quirkiest and least RISCy RISCs.

      On a totally unrelated note, I had the misfortune of debugging some Itanium code at the assembly level once. That was one bug I was very happy to find on an x86 machine as well, because I was spending more time in the reference manual than I was in the debugger.

    2. Re:VLIW != RISC by loufoque · · Score: 1

      a VLIW does multiple instructions in parallel, but each of these are usually pretty small and simple.

    3. Re:VLIW != RISC by Anonymous Coward · · Score: 0

      You seem to be comparing bundles and instructions. If you look at only the instructions of Itanium, the things that make up bundles, you'll find them entirely RISC-like.

    4. Re:VLIW != RISC by unixisc · · Score: 1

      In VLIW, like RISC, the instructions are fixed length. What makes VLIW different is that a lot of dynamic analysis that's done in the silicon for RISC - branch prediction, speculative execution and so on - is done in the compiler. EPIC comes somewhere in between - using flags to indicate the dependency b/w instructions, and executing accordingly. Yeah, RISC do depend on a lot of compiler optimizations, since their software is more often then not written in high level languages, but still, RISC doesn't come even close to VLIW when it comes to dependence on the compiler for certain functionality.

      The only way the two are even close is that they are not CISC, and don't require microcode.

  10. Intel not going after RISC? by PCM2 · · Score: 2

    Ehhh? The summary seems a little cockeyed. Does anyone on /. really believe this is the first time Intel is using "the R-word'? Intel has been positioning its chips against RISC for ages. Yes, in the past it was using Itanium as its "high end" chip, because it was more directly competitive with IBM's and Sun's offerings (and it probably had bigger margins). But here's an article from 2004 which claims "Intel markets the [Itanium] chip as a replacement for RISC processors from companies like Sun and IBM" -- pretty much exactly what the summary is claiming is "a first" here.

    If anything, Intel has chosen not to throw around a lot of rhetoric about x86/x64 as a replacement for RISC servers out of deference to its partners. Back in 2007, you will recall, Sun started marketing x86 servers in addition to its RISC product line. How would it look if Intel went around claiming x86 was a replacement for Sparc servers? Intel left it to Sun's marketing to clarify where it saw its x86-based products in comparison to Sparc. Similarly, around the same time HP was putting out x86 and Itanium servers -- Intel wasn't going to muddy the waters there, certainly.

    On the other hand, Red Hat and Dell would certainly talk about Linux servers (read: x86) as replacements for proprietary Unix servers (read: RISC). So it's certainly not like this is the first time anyone floated the idea, and it's certainly not like Intel has backed off from competing with RISC at any point in the past, no matter which component gets positioned against RISC chips.

    --
    Breakfast served all day!
    1. Re:Intel not going after RISC? by Anonymous Coward · · Score: 0

      I think the main story here is that Intel is no longer claiming that Itanium is their only alternative for high-end server processors.

    2. Re:Intel not going after RISC? by Anonymous Coward · · Score: 0

      Intel has usually used "RISC/CISC hybrid" to describe its modern architecture. The days of closed RISC ended when the European Space Agency developed a GPLed clone of the SPARC chip. The days were then roasted over a fire with the open-sourcing of the T1 and T2 UltraSPARCS by Sun. Hell, "closed architecture" was dead with MIPS (an open standard even if the implementations are closed).

  11. Hmmm... by fuzzyfuzzyfungus · · Score: 2

    I'd say that Intel is playing pure weasel-words with their "expensive, closed, RISC" line...

    Are most of the Big Serious Iron RISC/*NIXes available from only a single vendor, often one with rather predatory pricing philosophies? Yeah, arguably so.

    However, x86-with-Serious-RISC-level-RAS-features isn't exactly a vibrant competitive market... It's pretty much Intel and, um, *crickets*...

    The low end of x86 actually has a number of weirdo 3rd parties, in addition to the big two, the middle of the market is a duopoly, but a pretty feisty one; but x86 high enough to compete with the classical serious RISC stuff on its own ground(as opposed to on the grounds of architectural changes that favor big clusters of expendable servers) is basically a single-shop thing. AMD has some pretty decent x86 servers; but Intel is the one bringing the itanium RAS stuff down to their Xeons.

    Arguably, the lower end of RISC is substantially more competitive than that of x86: there are some huge number of ARM licencees, a whole bunch of random MIPS stuff floating around, and so forth. Only the middle-performance area, which is an effective duopoly(VIA? right...), but a pretty cutthroat one, where most people find their price/performance sweet spot, really makes x86 look like a competitive market at all...

    1. Re:Hmmm... by Anonymous Coward · · Score: 0

      My thoughts, exactly. But when Intel will have killed the remaining competitors (Power, Z, Sparc if it's still around), they will be the only game in town and able to charge even higher prices.

      Fight the x86 monoculture!

      This said, for some commercial workloads, x86 is far from Power and Z with their decimal units, and that's where the real money is. Will Intel add another couple hundred instructions to the already bloated (but almost infinitely expandable thanks to a very flexible but disgusting encoding) instruction set?

    2. Re:Hmmm... by bws111 · · Score: 1

      Who in this day and age has predatory pricing?

    3. Re:Hmmm... by Shinobi · · Score: 1

      IBM... IBM.... Oh and IBM....

      Too bad that their top-end equipment is rather nice....

    4. Re:Hmmm... by BBCWatcher · · Score: 1

      I'd say no. IBM isn't gaining its server marketshare with predatory pricing. Yes, their top-end equipment is nice, but IBM has also been cutting their prices regularly. (That's very easy to see in their mainframes, for example, where it's quite transparent.) Predatory pricing means less-than-superior stuff that is priced at superior rates. If I'd vote for anyone fitting that description, I'd vote for Oracle/Sun. Oracle has done nothing but squeeze the remaining Sun customers as hard as possible while doing less than the bare minimum to stay in the server business. It's not pretty. :-(

    5. Re:Hmmm... by Shinobi · · Score: 1

      I'd define that more as malign parasite pricing.

      IBM is happy to price it high enough to make you feel it in your budget, but not high enough to negate the value of their products to your business.

    6. Re:Hmmm... by bws111 · · Score: 1

      IBM stuff is EXPENSIVE. That is the exact opposite of predatory pricing, which is selling something for a very low price in order to drive competition out of business.

  12. WTH? Is this an Intel ad? by Anonymous Coward · · Score: 1

    This is hardly the first time intel has used the 'R-word' in marketing of Xeons.... Article brings nothing new to the table, hell this has been the Xeon marketing campaign for a decade...

    1. Re:WTH? Is this an Intel ad? by Ant+P. · · Score: 2

      Nah, it's just Intel admitting they lost the mobile market to ARM and the value-for-money market to AMD, so all they have left is the ricer and more-money-than-sense market.

  13. Pay no attention to the man behind the curtain by gstrickler · · Score: 4, Informative

    Remember all those slow, complex, cumbersome instructions from the 80x86, they're still around, just moved to microcode while all the simple stuff is implemented using the same techniques pioneered by RISC designers. But since this is a server, you're probably running x64 code, which was designed to be much more RISC like in the first place.

    So, I guess the real message is "Replace your non-Intel based RISC systems with Intel based RISC systems. But wait, don't answer yet! As an added bonus, Intel chips have extra hardware added so they can run all your old x86/CISC code too, that way we can pretend they're not RISC systems based on the AMD designed x64 instruction set."

    --
    make imaginary.friends COUNT=100 VISIBLE=false
    1. Re:Pay no attention to the man behind the curtain by RightSaidFred99 · · Score: 1

      probably running x64 code, which was designed to be much more RISC like in the first place.

      That doesn't even make sense. You do know that adding more registers doesn't make something "much more RISC like" right?

    2. Re:Pay no attention to the man behind the curtain by Anonymous Coward · · Score: 0

      You do know that adding more registers doesn't make something "much more RISC like" right?

      Eliminating a bunch of instructions would, tho..

    3. Re:Pay no attention to the man behind the curtain by gstrickler · · Score: 4, Informative

      You do know that x64 has a simplified instruction set, simplified addressing modes, larger registers, a larger logical register file, and a much larger physical register file with register renaming, right?

      It still supports the full x86 instruction set when running in "legacy mode", but in "long mode", it only supports a subset of instructions, and supports only 16, 32, and 64 bit registers and operands (no 8 bit support), and standardizes the instruction lengths to provide better memory alignment, and simplified instruction processing. And in either mode, all the instructions are converted to one or more macro/micro-ops before running on the "real" RISC core.

      You knew all that, right? Of course you did.

      --
      make imaginary.friends COUNT=100 VISIBLE=false
    4. Re:Pay no attention to the man behind the curtain by rbarreira · · Score: 1

      You do know that x64 has a simplified instruction set (...)

      I don't remember hearing about this part... what significant chunk of instructions was removed?

      --

      The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
    5. Re:Pay no attention to the man behind the curtain by chuckymonkey · · Score: 1

      It's not that it was removed, you only use the more complex and crappy legacy stuff when you're not running in legacy x86 mode. So yes the instructions are still there, but if you're running x64 then you're not using them.

      --
      "Some books contain the machinery required to create and sustain universes."-Tycho
    6. Re:Pay no attention to the man behind the curtain by Anonymous Coward · · Score: 0

      No, true RISC does not run CISC in legacy mode, you're thinking of intel's rendition of it: x86-64 or whatever you want to call it.

      You want to see RISC as it was? Look at DEC Alpha, Sun's older servers before they became intel's bitch, and IBM mainframe CPUs. While beautiful, they were different.

    7. Re:Pay no attention to the man behind the curtain by LWATCDR · · Score: 1

      Maybe that should be Intel's "next big thing". A Xeon that just supports the x64 instruction set drop real mode, drop segments, drop 286, drop the I/O instructions and make a pure 64bit ISA.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    8. Re:Pay no attention to the man behind the curtain by bws111 · · Score: 4, Informative

      IBM mainframes are z/Architecture machines, and they are certainly not RISC. z/Architecture has about 1000 opcodes, including things like 'Square Root' and 'Perform Cryptographic Operation' and 'Convert Unicode to UTF-8'.

    9. Re:Pay no attention to the man behind the curtain by gstrickler · · Score: 1

      Great idea. Make it like a real RISC CPU, without all the x86 backwards compatibility addons. What a concept. Of course, then Intel couldn't claim "it'll run all you legacy software", and they might even have to admit it's a RISC design. And where would that leave them?

      --
      make imaginary.friends COUNT=100 VISIBLE=false
    10. Re:Pay no attention to the man behind the curtain by rabun_bike · · Score: 1

      But, sadly, the logic gates are still taking up space on the chip to support all the "baggage" and anyone who has seen the x86 instruction set knows there is lots of baggage going all the way back to the 8088 with the lovely big-endian data segment implementation. Those historic junk logic gates take up space, create heat, and burn power. Since shrinking chips and increasing Mhz isn't cutting it we went to multi-core. Now we are seeing limitation of multi-core so we bump up the bus speed and add more fast cache. All this juxtaposition eats up power. At some point the path forward will be a to break legacy code. I think we are fast moving towards that possibility with the wide adoption of ARM. If consolation data centers see large energy savings with a true RISC processor the market will move that direction.

    11. Re:Pay no attention to the man behind the curtain by Anonymous Coward · · Score: 0

      IBM mainframes are z/Architecture machines, and they are certainly not RISC. z/Architecture has about 1000 opcodes, including things like 'Square Root' and 'Perform Cryptographic Operation' and 'Convert Unicode to UTF-8'.

      My best guess is Anonymous hacked this page and replaced every instance of RAS with RISC. None of it makes any sense to me.

    12. Re:Pay no attention to the man behind the curtain by the_humeister · · Score: 1

      real mode, I/O instructions, etc. can't possibly take up that much of the transistor budget. Especially not when they can cram several cores + 30 MB of cache on one die.

    13. Re:Pay no attention to the man behind the curtain by RightSaidFred99 · · Score: 1

      No, it wouldn't. You don't know what RISC is, it's not about the number of instructions.

    14. Re:Pay no attention to the man behind the curtain by RightSaidFred99 · · Score: 1

      Sorry, you're full of it. x64 still has variable length instructions, multiple addressing modes, and complex instructions. The number of instructions is irrelevant and in fact RISC can often have more instructions than CISC.

      It's not "much more like RISC" by any reasonable definition, and x86/x64 has been using a "real RISC core" for ages.

    15. Re:Pay no attention to the man behind the curtain by gstrickler · · Score: 2

      Actually, while those extra gates do take up die space, they're probably fully power gated, drawing no power and producing no heat when in "long mode". How much die space is probably small, remember a 486 only had around 1M transistors, including it's cache. Even if there are 10M transistors dedicated to maintaining compatibility in a modern CPU, that's ~1% of a modern CPU.

      x64 mode already breaks backwards compatibility with quite a bit of x86 code, particularly x86 code that isn't 32-bit code. Anything written before the 386 was introduced wont run under 64-bit mode, almost nothing written before Windows 95 came out will run, and a whole bunch of stuff written before Windows XP came out won't run. There's some newer stuff that won't run, but by the time XP started shipping most software was moving to a 32-bit model, and so will likely run (some may require some minor tweaks and/or a recompile). So, most software written in the last 8-10 years should be ok, but most software written before '95 won't, and between '95 and 2003 it's hit and miss. They could probably save more power and/or get better performance by removing some more instructions and breaking compatibility even more, but it's probably not worth it to most users to have to replace so much software. Deprecating instructions today and removing them 6-10 years from now might be viable, but only if the customers see the benefits (as they are seeing with the move to 64-bit), and I don't see that happening unless ARM starts taking a lot of the server market from Intel.

      --
      make imaginary.friends COUNT=100 VISIBLE=false
    16. Re:Pay no attention to the man behind the curtain by unixisc · · Score: 1

      All recent IBM computers, from what I understand, are based on Power7. Or am I mistaken?

    17. Re:Pay no attention to the man behind the curtain by Kjella · · Score: 1

      From what I've gathered they also have a form of "soft depreciation" where obsolete instructions are implemented in microcode, meaning the code still runs but much slower and a smart compiler wouldn't use those instructions anymore. That's pretty effective without breaking compatibility left and right.

      --
      Live today, because you never know what tomorrow brings
    18. Re:Pay no attention to the man behind the curtain by serviscope_minor · · Score: 1

      All recent IBM computers, from what I understand, are based on Power7. Or am I mistaken?

      http://en.wikipedia.org/wiki/IBM_z196_(microprocessor)

      --
      SJW n. One who posts facts.
    19. Re:Pay no attention to the man behind the curtain by blind+biker · · Score: 1

      More exactly, 894 opcodes, of which 3/4 are implemented in hardware. That's a bit less than 700 "classic" CISC opcodes.

      Those are the figures for the newest z/Architecture CPU, the z10 microporcessor.

      --
      "The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
    20. Re:Pay no attention to the man behind the curtain by Renegrade · · Score: 1

      As far as I know, all instructions are implemented in microcode... aside from in 6502s.

    21. Re:Pay no attention to the man behind the curtain by Anonymous Coward · · Score: 0

      So tell me, what do those microcode instructions break down into for execution?

    22. Re:Pay no attention to the man behind the curtain by LWATCDR · · Score: 1

      Wasted space is wasted space. Most of that code has been moved into microcode but why even bother with it all? Yes your Xeon will not run DOS apps but who cares.
      Where they really need to do this is on the Atom line.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    23. Re:Pay no attention to the man behind the curtain by tlhIngan · · Score: 1

      real mode, I/O instructions, etc. can't possibly take up that much of the transistor budget. Especially not when they can cram several cores + 30 MB of cache on one die.

      Transistors, no, but die area, yes. Caches consume a huge number of transistors, but relatively small amount of die area for those transistors - 30MB of cache (180M transistors!) may occupy around 50-75% of the available die space. The rest of the transistors are the general logic, where it's the wiring that determines how dense the transistors are.

      And the x86 compatibility stuff is known to take up to half of the available space. x86 is terrible logic-wise since instructions are variable sized (which means the instruction fetcher needs to cross cache line boundaries and instructions may cross cache lines), and since it isn't load/store, instructions that reference memory has to decode into several instructions - one or more to calculate memory address (depending on addressing mode), and to do the actual load/store.

      So no, the x86 front end doesn't take a lot of transistors, but the ones it does take, do take a lot of space. Space that can be used for more cache or more logic blocks. Or just make a smaller die (which lowers cost when you can shove more onto a wafer).

    24. Re:Pay no attention to the man behind the curtain by Anonymous Coward · · Score: 0

      Well, it would reduce...

    25. Re:Pay no attention to the man behind the curtain by Chris+Burke · · Score: 1

      Not true. The majority of common instructions are decoded directly by the decoders. Only more complex instructions are implemented in microcode.

      Unless you just meant "implemented in microcode" as in "decomposed into micro-ops", which is true; technically even a pure load instruction is decoded into a single load micro-op. But microcode usually means a ROM that is read from as a type of instruction memory to get all the micro-ops that make up one CISC instruction. That's only used for a subset of instructions.

      --

      The enemies of Democracy are
    26. Re:Pay no attention to the man behind the curtain by badkarmadayaccount · · Score: 1

      Actually, it is. It is also about orthogonal semantics - no implicit registers, no perverse addressing modes, not to deep a state tree when executing any single instruction (mostly means keeping memory accesses capped - leads to very neat pipelinening.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    27. Re:Pay no attention to the man behind the curtain by badkarmadayaccount · · Score: 1

      Decoded instructions (micro-ops) are created by the hardware decoder. Microcode is programmable - does nearly the same thing, handles complex instructions well. A procedure ROM is a whole other thing.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    28. Re:Pay no attention to the man behind the curtain by badkarmadayaccount · · Score: 1

      Not to mention having multiple unprivileged addressing modes - non-orthogonal, almost any number of address spaces - in a single process. In user mode. Oh, and these are the ones in 64-bit mode - you can mix and match with the two older modes - you even have special instructions for it. 5 or 6 instruction formats. Microcoded or hardware implemented or just plain missing instructions (some). And... You get the idea.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    29. Re:Pay no attention to the man behind the curtain by Chris+Burke · · Score: 1

      Microcode is implemented as a ROM in x86 processors. There is usually a small amount of programmable ucode-patch memory to allow BIOS to update ucode to fix bugs or work around performance issues. Implementing the entire microcode as a programmable memory would be needlessly wasteful.

      --

      The enemies of Democracy are
  14. It's just spin by msobkow · · Score: 1

    The 64-bit x86 machines have been eating away at IBM's, HP's, and Sun's market share for years. Partnered with a good Linux distribution and VMWare, they're more than capable of taking on "the big boys."

    Oracle/Sun has been resting on their laurels for far too long. Time will tell whether Oracle manages to plug the holes in that sinking ship.

    HP's Itanium boxen have never had significant market share.

    That leaves IBM. And IBM doesn't sell you just a POWER based system -- they sell you the whole suite of applications, support, and data center integration. They maintain their market share by making it EASY for business to buy a SOLUTION instead of a computer.

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:It's just spin by Anonymous Coward · · Score: 0

      That leaves IBM. And IBM doesn't sell you just a POWER based system -- they sell you the whole suite of applications, support, and data center integration. They maintain their market share by making it EASY for business to buy a SOLUTION instead of a computer.

      LOL. I want some of what you're smoking. IBM will leave with you an army of useless consultants, professional bullshitters, providing you with complete solutions that will take a shitfuck of time to make it work (I will concede the point that after it works it generally works fine, but it takes a amount bizarre time and patience to get it up and running). They maintain their market by playing golf with your CEO, like everyone else does.

      Well, although I'm far from being a C-level monkey IBM has paid me a few very nice golf classes. Nobody ever gets fired for buying IBM!

    2. Re:It's just spin by the+linux+geek · · Score: 1

      HP Integrity's been in second place behind IBM Power and ahead of SPARC for a while.

    3. Re:It's just spin by Lawrence_Bird · · Score: 1

      and POWER7 does seem to kick ass too, no?

    4. Re:It's just spin by Anonymous Coward · · Score: 0

      It's a little more complicated. IBM has been gaining server marketshare for quite some time. POWER7 is now up past 60% marketshare in RISC UNIX. In other words, POWER has been gaining share faster than the overall RISC UNIX market (which includes HP Itanium and Sun/Fujitsu SPARC) has declined. zEnterprise is...well, it's just amazing. It has zoomed up to 9% of the total server market on high double-digit growth rates for the past few reports (and even higher growth rates in capacity shipments, so declining prices). IBM also makes X86 servers, so IBM participates in growth in that market, too.

      Anyway, it's getting to the point where if you need anything else besides (or in addition to) X86 servers, you'd better call IBM. The server market is extremely competitive, but it's getting much more like an Airbus/Boeing market in at least some respects. Which makes sense, really, because IBM is the only company left aside from Intel spending many billions of dollars on server hardware R&D.

  15. Why we hate x86 by erice · · Score: 3, Insightful

    I've always been curious about this kind of statement. I hear it a lot. While I understand the complexities of silicon implementation (finding instruction lengths and decode are a PITA), I've always thought the ISA itself was rather elegant. Yes, there is cruft that could be dropped and AMD did some of that with X86-64, but overall, the day-to-day instruction set is mostly orthogonal and has a fairly regular encoding. GPR shifts, MUL and DIV are a bit quirky and the lack of a packed 64-bit integer multiply is an almost unforgivable sin, but overall, I rather like it.

    What are the things you would like to see changed? We need specifics to have an interesting discussion. :)

    Limited number of registers
    Instructions that require certain registers or a certain subset of the registers
    No three register operations. This impacts pipelining because it is not possible not overwrite one of the source registers.
    Variable instruction length makes decode a headache

    Lots of really bad stuff that isn't used much by modern code by still must be maintained for compatiblity: segments, 286 protection, IO instructions, etc.

    I've wondered sometime what attitudes would be if a more likable contemporary instruction set had won. VAX and 68000, for instance, are much more palatable to program but they have performance flaws that are probably worse than x86.

    1. Re:Why we hate x86 by afidel · · Score: 1

      Limited number of registers
      X86-64, with register renaming 16 is more than enough. AMD did a lot of research before settling on 16, more added significantly to complexity but on increased average program executing speed by low single digit percentages.

      Variable instruction length makes decode a headache
      Meh, who cares, the whole decoder stage is a couple percent of the non-cache transistor budget. It mattered more back in the PPro era when it was a significant amount of the budget but today it's peanuts and the more verbose ISA makes better use of cache lines which are a much more limited resource in modern designs.

      Lots of really bad stuff that isn't used much by modern code by still must be maintained for compatiblity: segments, 286 protection, IO instructions, etc.
      Most of it's effectively gone on x86-64 processors even if it's still there for backwards compatibility, if you're writing modern code it has no effect on you.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    2. Re:Why we hate x86 by Anonymous Coward · · Score: 0

      Variable-length instructions do have a downside: IBM's Power processors can decode more instructions in parallel than Intel or AMD processors.

    3. Re:Why we hate x86 by serviscope_minor · · Score: 1

      Meh, who cares, the whole decoder stage is a couple percent of the non-cache transistor budget.

      On the high end, the processor has a massive slew of very fast FPU and integer execution units, and a whole bunch of hardware dedicated to getting the absolute best use out of them possible (the out of order unit). The compute hardware tends to be very well utilised and the flops per Watt are actually rather good for a general purpose CPU. In that case, the decoder has little effect.

      On the low end, it is a very different story. In the Atom/high end ARM world, the decoder is a much larger fraction of the budget, and even worse, it is always in use, making it quite power hungry.

      --
      SJW n. One who posts facts.
    4. Re:Why we hate x86 by TheRaven64 · · Score: 1

      AMD did a lot of research before settling on 16, more added significantly to complexity but on increased average program executing speed by low single digit percentages.

      This is not constant, it depends a lot on the language. For a more dynamic language, like Lisp or JavaScript, more registers give you a significant benefit. For C, 16 is usually more than enough.

      --
      I am TheRaven on Soylent News
    5. Re:Why we hate x86 by Agripa · · Score: 1

      No three register operations. This impacts pipelining because it is not possible not overwrite one of the source registers.

      I wonder about this one. Adding 3 register instruction support also means adding an additional set of read ports to the register file. Is it better to execute more instructions in parallel at a higher clock rate or have 3 register instructions?

    6. Re:Why we hate x86 by bheading · · Score: 1

      I am not sure the backwards compatibility argument completely stands up these days.

      Back in the Amiga days, when the 68060 came out (we are going back the guts of 15 years here), the new processor dropped a few rarely-used instructions. To compensate, Motorola shipped a small library which allowed the old instructions to be simulated when they were detected via an illegal instruction trap.

      By working with OS and compiler vendors, Intel could very easily deprecate and phase out all the old backwards-compatible instructions and addressing modes ahead of time. The only group of customers who would be effected by this would be folks who run old, unpatchable operating systems or software but yet also want to run the latest hardware. It's very hard for me to believe that this group is a significant %, especially not relative to the number of customers who are ready to patch their system and who want the benefits of the faster CPU.

    7. Re:Why we hate x86 by erice · · Score: 1

      No three register operations. This impacts pipelining because it is not possible not overwrite one of the source registers.

      I wonder about this one. Adding 3 register instruction support also means adding an additional set of read ports to the register file. Is it better to execute more instructions in parallel at a higher clock rate or have 3 register instructions?

      Actually, no. The number of read ports is the same. The third register is the destination. The logic required to mitigate contention to the overwritten source register is much greater than than simply decoding a third address. Three register operations easily fit into a 32 bit instruction.

    8. Re:Why we hate x86 by David+Greene · · Score: 1

      You make some good points. Let us remember that this is all tradeoffs. Maybe better choices could have been made, but they weren't "dumb" choices, which is what I hear a lot of people say.

      Limited number of registers

      Given the memory limits at the time, this was a reasonable tradeoff to gain text space. Solved somewhat with x86-64 but I agree it's not enough.

      Instructions that require certain registers or a certain subset of the registers

      Certainly. Thus my references to shift/DIV/MUL.

      No three register operations. This impacts pipelining because it is not possible not overwrite one of the source registers.

      Software pipelining? Yes. Fixed with AVX at least for the FP side. Integer instructions are still two-operand but it is less problematic there.

      Variable instruction length makes decode a headache

      It's also a great way to make the icache efficient. I think this was a good choice.

      Lots of really bad stuff that isn't used much by modern code by still must be maintained for compatiblity: segments, 286 protection, IO instructions, etc.

      Yep. And no one uses it anymore. AMD eliminated a good deal of it in x86-64.

      --

  16. Itanium && RISC by Anonymous Coward · · Score: 0

    Strange, last I heard, the Itanium processor line wasn't RISC at all...but rather EPIC and/or VLIW...not sure which applies.

    --AC

  17. CLOSED? by Anonymous Coward · · Score: 0

    Oh no, somebody better tell sparc.org to cease and decist!

    1. Re:CLOSED? by staalmannen · · Score: 1

      You mean openSPARC ( http://www.opensparc.net/ ) and openRISC ( http://openrisc.net/ ). I thought there was a MIPS and a Power-based open hardware project too but I could not find it right now.

    2. Re:CLOSED? by unixisc · · Score: 1

      I couldn't find any references to any open MIPS projects, but there is a Power.org that has open the Power spec.

  18. Wow, what a terrible article by Sebastopol · · Score: 1

    First off, Intel went RISC in 1995 with the PentiumPro, the ISA is CISC, but the uISA is RISC. (Semantics. Bite me.)

    Second, Itanium is VLIW, not RISC.

    Third, who cares? Sun and IBM are phoning-it-in with this market, just look at the ISSCC proceedings for the past decade.

    I'm surprised Intel is even bothering. Is the market that big? Will it grow their bottom line? Anyone?

    --
    https://www.accountkiller.com/removal-requested
    1. Re:Wow, what a terrible article by Anonymous Coward · · Score: 0

      No. It will not grow their bottom line.

      and YES. Intel went RISC with the Pentium Pro.

    2. Re:Wow, what a terrible article by the_humeister · · Score: 1

      There was an article over at arstechnica looking into why Itanium is still around. Apparently the Itanium market is worth $4 billion. Not exactly chump change.

    3. Re:Wow, what a terrible article by Anonymous Coward · · Score: 0

      Is the market that big? Will it grow their bottom line? Anyone?

      Volume-wise no, money-wise yes. As in any tiered market, the highest tier has the lowest volumes and the highest margins (as opposed to the lowest tier which has the highest volumes and the razor-thin margins), Take the home computer market was a more accessible market, specifically the higher end of it, where Apple lives. Sure, they only have about 5% of the market (low volume), but no one can argue it isn't hugely profitable (high margin).

      Look at printers, even, consumer-grade inkjets are dirt cheap and outnumber their large format industrial counterparts by several orders of magnitude, they also cost proprtionally less.

      People are willing to pay a premium for added value or a solution that offers something the competition does not, Sparc and Power offer vertical scalability (amongst other things), whereas x86 does not, Sparc offers higher throughput and actually kills x86 on price/performance on workloads that require sufficient throughput to warrant investing in big iron. People always compare the cost of say, a single T3 system to a single x86 system, without comparing the cost of how how many x86 systems it takes to match the throughput of a single T2. 32 cores and 512 (physical) threads on 4 sockets may not be the kind of throughput the vast majority of workloads need, and even fewer still need the throughput offered by an mSeries or pSeries, but those who do are willing to pay a premium for it (especially since it ends up costing less than the equivalent x86 cluster, drastically fewer but more expensive nodes does add up).

      From a business perspective it comes down to weather you prefer to take $1 from 1,000 people, or $100 from 10 people.

      It shouldn't be surprising at all that Intel wants in on a hugely profitable market they have close to zero penetration in (itanium is very distant third to Sparc and Power, and x86/x86_64 is a non sequitur on the upper midrange and beyond)

      That and the whole Distinction between RISC and CISC has been pointless for years. x86_64 borrows about as much from RISC at this point as Sparc and Power do from CISC.

    4. Re:Wow, what a terrible article by unixisc · · Score: 1

      No, Pentium Pro was very much CISC. As an above poster noted, just having a RISC core doesn't make the overall CPU a RISC CPU. The instructions have to be of fixed length so that microcode doesn't have to decode it into smaller RISCy instructions.

  19. Hard to take the story seriously by sl3xd · · Score: 2, Insightful

    We live in a post-RISC world. Nearly every modern processor's "core" use the major innovations of a RISC chip. The size of the instruction set is of little importance; many so-called "RISC" architectures (such as Power) have a larger instruction set than the "CISC" x86_64.

    The main issue that spawned the development of RISC (that instruction sets were getting so large and unwieldy that instruction decode would take the lion's share of a die's transistors) turned out to be less of a problem than anticipated. At the time, many CISC chips (VAX in particular) were implementing high-level programming features in the architecture's assembly language.

    Nearly all of us have decided that efficient compilers have made a high-level, expressive assembly language unnecessary.

    Another factor is that modern processors are superscalar, with multiple execution pipelines per core - one instruction decoder then feeds several pipelines, which further reduces the relative size of the instruction decode.

    However, modern chips do implement (at least internally), other "core" ideals of the RISC processor:
    - Numerous registers
    - Load/Store memory access
    - Multi-stage Pipelines
    - One instruction per clock tick (ie. keep the complexity of an instruction down to what can execute in one tick - if something takes more than one tick, break it down into smaller pieces).

    The one thing that the so-called "RISC" chips have historically been known for is dependability: The machines that use them don't crash. This requires more than just a good CPU: It requires good hardware in general, and a good operating system. The "RISC" vendors - such as Sun (now Oracle), IBM, HP and SGI, control the quality of the entire system - from the electrical components, to the chassis, to the airflow in the chassis. Even the datacenter's abilities (power, cooling capacity, airflow) are specified.

    There are a lot of things that go into making a system that's mission-critical, and the CPU is a small part of the equation (and usually is the least troublesome). Putting an CPU on a motherboard doesn't give me guarantees about airflow, power reliability, I/O stability and speed, vibration tolerance, nonblocking I/O, and reliability - to say nothing about core OS stability.

    Intel isn't interested in doing anything other than selling chips. Unless Intel is willing to take upon themselves a whole-system approach - covering everything from the chassis, cooling and airflow, power supply, motherboard, and core operating system - they'll never play in the league.

    Making a mission-critical system is left to others who use Intel's chips, such as HP's high-end Itanium line, and SGI's Altix and Altix UV systems (using Itanium and x86_64).

    --
    -- Sometimes you have to turn the lights off in order to see.
    1. Re:Hard to take the story seriously by evilviper · · Score: 2

      There are a lot of things that go into making a system that's mission-critical, and the CPU is a small part of the equation (and usually is the least troublesome).

      That's not really true. The lack of high-end features in x86 CPUs was the weak link in getting reliable servers for some time. And when those features started being added, they appeared in servers almost immediately. Even now Xeons lag significantly behind proprietary CPUs, and Intel is just once again on a marketing push to claim every incremental improvement suddenly makes them ultra-reliable.

      Also, the main place all these features need to be is in the chipsets, which Intel also manufactures.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  20. x86 are RISC since P6 by maitas · · Score: 4, Informative

    When the PentiumPro came along (the first P6 processor) it used internal RISC architecture, and all Intel x86 cores from that time to today stilldecode the x86 instructions in what intel calls r-ops (risc operations) and then it processes them.

    Nevertheless the part where Intel says "The days of IT organizations being forced to deploy expensive, closed RISC architectures" it is a lie. You can get the UltraSPARC-T2 Verilog code to make those chips yourself and hte code is GPL. You can't do that with any Intel processor. So Intel processors are the really "closed" processor. It is true that RISC processor are more expensive, but it has nothing to do with "closed"

    1. Re:x86 are RISC since P6 by Anonymous Coward · · Score: 0

      Which just shows that expensive and cost effective are different concepts. You can serve a lot more web pages per second for a lot less money with a 32-core SPARC machine than anything Intel makes.

  21. It's not the CPU, it's the whole product. by HockeyPuck · · Score: 1

    Sometimes I need to scale vertically and not horizontally. There are times when you need a single chassis with 200+ cores and 8TB of ram and hundreds of PCIe slots for IO. You can take my pSeries from my cold dead hands.

    Intel solutions are getting there with 80 cores and 2TB of RAM.

    However, when it comes to moving IO, nothing beats big iron.

    1. Re:It's not the CPU, it's the whole product. by afidel · · Score: 1

      Unisys offers 6TB of ram, though still "only" 80 cores. Personally I think you probably need to seriously consider a redesign if you need to go bigger than that, but in the enterprise space that kind of development effort normally costs more than buying a couple million dollar box and the couple hundred thousand a year support contract to go along with it. I guess I'm fortunate in that my biggest workload runs well on a 16 core box with a couple SSD's for the main tables.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    2. Re:It's not the CPU, it's the whole product. by aztracker1 · · Score: 1

      Agreed, I can't think of very many instances where a given type of workload can't be distributed for less outlay of cost over big iron servers. It does depend, but then again, full ACID in database servers isn't usually necessary either.

      --
      Michael J. Ryan - tracker1.info
    3. Re:It's not the CPU, it's the whole product. by BBCWatcher · · Score: 1

      So if one woman can produce one baby in 9 months, does that mean if you assign 9 women to the job you'll get one baby delivered in one month?

      There are lots of workloads that are inherently single threaded (and probably always will be). If you've got a bigger, faster, more powerful CPU (or vertically scalable server, which fast shared memory and super fast I/O), that'll be a better fit for those sorts of workloads. IBM zEnterprise mainframes are the preeminent examples of the type, and they're selling extremely well. Different servers for different missions.

    4. Re:It's not the CPU, it's the whole product. by Shinobi · · Score: 1

      Another reason the z10 sells well is native BCD calculations, meaning that in some tasks coupled with their massive I/O, they are so much faster than Intel/AMD offerings that you'd need AT LEAST 10-15 times more Intel/AMD hardware, with the requisite floorspace, networking, power cabling, cooling and UPS's for all that to compare merely on the theoretical side. In practice, it can get even worse, since the tasks don't parallellize well.

    5. Re:It's not the CPU, it's the whole product. by Anonymous Coward · · Score: 0

      Not to dismiss the pSeries which are indeed amazing systems (and the Power 7 does have the fastest single-thread performance of any CPU out there, which is not to be dismissed - not every workload is parallelizable), but for vertical scaling, ridiculously large Intel based solutions already exist - there's the Altix UV (not to be confused with Altix ICE, which is a cluster solution) is a Xeon-based NUMA system that scales up to 2560 cores and 16TB in a single system image. This is, to my knowledge, bigger than any SSI machine in the "RISC" space currently available.

      Slightly less rare and esoteric than the SGI monsters, IBM's x3950 4U rack servers have NUMA interconnects that allow you to link multiple servers into a single shared-memory system, and also external RAM-only 1U modules, enabling you to create a single server with 80 cores and 6TB of RAM. Older versions of the 3850/3950 would let you link up 4 "nodes", but it seems this has been reduced to 2 in the X5 iteration.

      Triple-digit core x86 servers are going to be available pretty soon from all the major server vendors.

  22. real RISC by Anonymous Coward · · Score: 0

    RISC isn't literally about having fewer instructions or smaller instruction sets. Its a design philosophy where the instructions are reduced down to practical sizes where 1 simple operation takes 1 simple cycle and are simple to implement in hardware; extra registers and cache are benefits of keeping things simple (plus you could use them more since you are running more ops.) I would argue that many things that have been added are RISC as long as they maintained the design -- doesn't matter if they add more instructions; if you literally thought it was just keeping instruction counts down then nothing could be RISC anymore.

    x86 isn't RISC if they decode microcode into smaller RISC like operations; an internal RISC. The outside instructions must be RISC; how they pull those off internally is not really part of it. Its a black box.

    Of course they will end up with smaller operations internally; the die sizes would be much larger if they didn't do that.... they could do it-- if you don't put out a RISC interface it is not RISC.

    For me, it generally needs load/store instructions to be RISC; combined load store instructions are really non-RISC thinking on something as fundamental as memory. prefetch is ok. FPU is ok. MMX, SSE, those are not RISC like. Power's VMU's I would argue are RISC in that they act like another unit; although, a few instructions may take a couple clocks... its not like those units are simple to make; back when I was learning it most everything was 1 clock and newer ones lowered it-- the goal was 1 clock (these few slow ops were not practically broken down into smaller ops... again its not the literal RISC rule but the design...)

    1. Re:real RISC by Anonymous Coward · · Score: 0

      I think you're arguing that if the programmer's view isn't RISC-like, it isn't a RISC processor. That's ridiculous. RISC has just as much to do with the internal implementation.

      Having said that, there's no such thing as a pure RISC processor. Early RISC were completely load-store, single cycle execution, with no pipeline interlocks (leaving a lot of the dependency checking / instruction scheduling to the compiler). But as things evolved, "RISC" processors adopted CISC features (SIMD instructions, complex hardware dependency checking, etc.) while CISC added RISC features. In order to get higher performance, each camp picked features from the other.

      The only reason anyone even mentions "CISC vs. RISC" is because IA32 still lives on, and because some of IBM's current processors still support stuff from the 360 mainframes.

    2. Re:real RISC by the_humeister · · Score: 1

      x86 isn't RISC if they decode microcode into smaller RISC like operations; an internal RISC. The outside instructions must be RISC; how they pull those off internally is not really part of it. Its a black box.

      You do realize that even IBM's POWER chips (the final bastion of "RISC") decode instructions into uops too, right? So, are you willing to concede that POWER isn't RISC?

  23. LEA? by Anonymous Coward · · Score: 0

    got one word for you, buddy. LEA.

    1. Re:LEA? by tibit · · Score: 1

      Because multiplications by a constant that's but an entry in a list having a couple of powers of two are all the rage these days.

      --
      A successful API design takes a mixture of software design and pedagogy.
    2. Re:LEA? by badkarmadayaccount · · Score: 1

      Isn't that bitshifitng?

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    3. Re:LEA? by tibit · · Score: 1

      Yeah, that and addition is usually bundled up in a LEA. Some architectures, like DSPs, also support modulo addressing in a LEA, I'm sure. But it's not a general-purpose multiplication operation. The AC was just confused or trolling.

      --
      A successful API design takes a mixture of software design and pedagogy.
  24. But 15 years ago.... by Anonymous Coward · · Score: 0

    10-15 years ago, Alpha and POWER were RISC processors, x86 was usually considered CISC with a RISC core powering it, and Itanium was EPIC. If memory serves, EPIC cpu hardware differed notably from either of the former in at least some significant ways, thus garnering its own space on the processor-family shelf. By blurring the boundaries with this story, Intel has pretty much proven that the cross-pollination of processor architectures over the last 15 years has made the processor-family-wars moot.

  25. All Intel chips have been RISC for a while now by boddhisatva · · Score: 1

    Intel's chips have been running on a RISC core for quite a while. The rest of the CISC instruction set is converted by microcode into RISC instructions. Just noticed the person before me said the same thing.

  26. Yawn.. by bored · · Score: 1

    Anyone buying POWER or SPARC is a lost cause anyway. Sure Intel might gain a few sales, but frankly the RISC volumes are pretty small and a huge number of them are "stuck" because they have existing applications that they are unwilling/unable to port to an alternative. Or the IT guys are religious zealots. This is the same reason you find AS400s/i5, Nonstops, OpenVMS, zos, etc machines running in data centers the world over. Its not because those OS's or the hardware actually provide some huge benefit that outweighs the 5x (or more, the sky is the limit in some cases) price difference between them and a basic Intel system. Its because companies have 8 and 9 figure investments in software running on them. They will probably still be in datacenters for decades into the future if IBM/Oracle/HP/etc don't decide to kill them off. They zombie on, as long as the original manufacturer supports them and the perceived/actual cost to port the application out weights the cost of buying a new machine/os every 5 years or so.

    1. Re:Yawn.. by Relayman · · Score: 1

      Nothing runs Linux like PowerPC. Nothing can handle virtualization like PowerPC. Intel only dreams of doing what IBM does every day.

      --
      If I used a sig over again, would anyone notice?
    2. Re:Yawn.. by Anonymous Coward · · Score: 0

      You'd be surprised how easy it is if you have a charismatic salesman. Oh that old thing? You need the new shiny sparkly server. Even if you have to emulate the old stuff and as long as tab A can still be inserted into slot B they will buy it.

    3. Re:Yawn.. by styrotech · · Score: 1

      POWER and PowerPC are two different things.

      Maybe you meant "nothing runs Mac OS 9 like PowerPC"? Or "nothing can hold up Steve Jobs plans like PowerPC"?

    4. Re:Yawn.. by Shinobi · · Score: 1

      You're just showing how little you know.

      When it comes to for example IBM's mainframes, for the jobs where they are used, they massively outperform any Intel/AMD cluster both in raw performance and in operational costs over the years.

    5. Re:Yawn.. by bored · · Score: 1

      That is what IBM tells you, try generating your own numbers for once instead of spouting the ones the IBM sales guy tells you.

      Sure, some of those machines have very high raw performance numbers... But a very large percentage of the installs actually partition that expensive machine up into a dozen or so smaller system images. Which of course negates a lot of the argument about operational costs because the majority of long term operational costs is related to the number of system images you are maintaining. Sure there are hardware support costs etc, but lots of companies can't even identify the performance bottlenecks in their system. Instead they just buy the latest pitch from $LARGEVENDOR, take their slight performance improvement, then repeat the process in a couple years.

      Thats not to say, there aren't customers where the numbers for POWER or whatnot work out in their favor, its simply saying that its a smaller portion of the market every year. I have a POWER system sitting less than 10 feet from me right now. But I also have a quad socket westmere, and both the CPU and IO performance on the westmere is frankly astonishing with our application when compared with the POWER. That said the sweet spot is actually the dual socket setups as they are significantly less expensive, and our application scales well in a cluster.

    6. Re:Yawn.. by Shinobi · · Score: 2

      Actually, it is the numbers we generated on our own that I'm running. For the project I worked on, a single loaded mainframe outperformed the Altix, off-the-shelf Dell cluster and a couple of other solutions the client looked at. Hardware support for BCD and the massive external I/O.

      As for partitioning, in secure environments, the low overhead and the ease with which you can do it on IBM's mainframe reduces the operational costs.

      The biggest operational cost over the years is floorspace+cooling+power, and that's where the real gain in, and that's where my clients really learned the difference. The primary and the backup system, complete with their storage arrays, cost just slightly more than just the primary off-the-shelf Dell system when factoring in the number of spares that have to be running just to keep the primary system operational in case of failures. Add to that the state of immaturity of reliable failover systems in the Linux world and the operational costs skyrocket.

      As for Westmere, it has nice performance for FP math or non-BCD integer math, and it has nice I/O to RAM/local devices, but external I/O is.. lackluster compared to what a z10 can do.

      My personal workstation is a dual quad-core Xeon with a crapload of RAM, because it fits the tasks I personally work with better than a z10 would, but if I were to actually work fulltime with the sort of stuff my last client uses their systems for, it'd be mainframes all over, because the performance and reliability for those tasks is just unparallelled by anything x86-based.

    7. Re:Yawn.. by bored · · Score: 1

      but external I/O is.. lackluster compared to what a z10 can do.

      Hardware support for BCD or decimal FP? Because x86 has had hardware BCD support since the 8086, and now you can do BCD with SSE. How may digits are your BCD values?

      I'm also curious what your cumulative IOP/GB/sec numbers are..

      We are pushing a little over 12GB/sec (yes bytes, and fully 1/4 of that is disk IO) through the PCIe buses on a dual socket westmere (including a fairly large amount of data transformation in memory), and that is the limit of the 4 adapters we have in the machine. There are slots for more, so it might do more. But once the new PCIe 3.0 sandy bridge machines come out we will probably upgrade the adapters, and put more of them in the machine.

      This on a machine that costs about 1/2 the cheapest P710 Express configuration. At those prices you can't even begin to touch the big iron even if we have a dozen or so nodes.

      Frankly, i've seen a lot of data centers time and time again, some guy who is talking about the IO requirements on his machine discovers when we drop an analyzer in the path that its only doing a few hundred MB/sec aggregate IO. They are transaction limited to disk, or latency limited between cluster nodes, etc..

    8. Re:Yawn.. by greed · · Score: 1

      POWER and PowerPC haven't been different since POWER3. POWER2--circa 1993--was the last "true" POWER CPU. All subsequent POWER CPUs have been based on the PowerPC ISA.

      http://en.wikipedia.org/wiki/IBM_POWER#POWER3

    9. Re:Yawn.. by styrotech · · Score: 1

      Using the ISA is a strange way of defining "not different" in the context of actual hardware that IBM ships and what they're capable of compared to what Intel ships.

      That would be kinda like saying an x86_64 Atom is not different from a Xeon 7xxx.

    10. Re:Yawn.. by Relayman · · Score: 1

      Let's try this again: Nothing runs Linux like POWER. Nothing can handle virtualization like POWER. Intel only dreams of doing what IBM does every day.

      --
      If I used a sig over again, would anyone notice?
    11. Re:Yawn.. by Shinobi · · Score: 1

      "Hardware support for BCD or decimal FP? Because x86 has had hardware BCD support since the 8086, and now you can do BCD with SSE. How may digits are your BCD values?"

      Use of both. And the "hardware support" on x86 for BCD is... slow, takes way more cycles than should be needed. And they are using 8-byte Packed BCD.

      Note, I was brought on for a specific niche here, tweaking and tuning the Infiniband setup.

      As for GiB/s numbers, depends on the time of day/time of year, 25-30GiB/s to and from the storage array is not unusual. When the project was deployed about halfway, we managed to saturate 8 of the 12 Infiniband links to the storage array during a peak demand, though that was with some of the most intense users having been connected already. The storage array has a pair of RAMSAN 630 devices as a buffer for recent/frequently requested data.

        More interesting to mention is the fact that the whole setup serves about 15000 concurrent "terminals"(read, workstations/desktops) nationwide, spread over hundreds of offices, some with gigabit access, some with 100 megabit access, working with statistical data, payroll/budget processing, analysis, forecasting etc, with strict separation of users/privileges, audit trails etc. And of course everything is encrypted by default.

      What I mean with lackluster on x86 etc is that I/O is still sequential bus limited, and even with DMA etc, the CPU STILL has to do some of the I/O shuffling gruntwork. On the mainframe, you have channels that can be individual or bonded as per your needs. The mainframe processor just tells a channel processor "here, job to do" and then proceeds with the next bit of processing it has to do.

      That also has benefits if you move onto virtualization

    12. Re:Yawn.. by bored · · Score: 1

      What I mean with lackluster on x86 etc is that I/O is still sequential bus limited, and even with DMA etc, the CPU STILL has to do some of the I/O shuffling gruntwork.

      This discussion has come up in numerous places over the last few years and is basically false. The majority of the modern x86 peripherals have as much if not more of intelligence than channel processors. For example fiber channel and SAS boards from qlogic/emulex/etc have full blown processors on them running firmware that handles all of the fiber channel protocol and a large part of the FCP portions. Leaving the CPU's to do little more than specify via SCSI CDB's and target ids which data blocks get moved where. Once the operation(s) are complete the board interrupts a CPU. These boards maintain all the connections, and keep track of tens of thousands of simultaneous IOs. The CPU usage to transfer 3GB/sec to/from disk in our setup is less than 1% and a large portion of that is our application sending messages to-from the OS. Its the same with inifiniband, as the protocol is handled by the adapter, leaving the CPU to do little more than trigger the remote operations.

      Combined with the fact that PCIe now includes peer to peer as part of the standard means that you can actually do IO between devices with out even the memory subsystem getting involved. This is how GPU's are doing SLI.

      Anyway, I think the original discussion was more about how intel was intending to displace the RISC vendors, aka the power systems not the mainframes. Either way, I think my original point stands, as i'm betting the system your talking about is well into the 7 figure range, or roughly two orders of magnitude more expensive for what is probably only one order of magnitude faster than a single node in our cluster. As our application has nearly linear scaling for node counts in the few dozen range we are an example of an application that probably gets similar (if not greater) IO and processing performance out of cheap Intel hardware.

      BTW: Texas Memory Systems makes some cool stuff, and systems like http://www.fusionio.com/products/iodrive-octal/ do a lot to move cheap intel hardware into places that traditionally required big iron.

    13. Re:Yawn.. by badkarmadayaccount · · Score: 1

      Shared memory access speed is still a mainframe stronghold. Though the logical structuring of the channel procs is... more logical, as well. PCIe latency for issuing IO commands cuts into IOPS, throughput is how much you put in, no matter where the die is, and hell, there are just a few standard protocols - integrate in the damn CPU already - or the motherboard. Oh, and FC is expensive, brittle, and doesn't give you anything high-density Ethernet+MPLS won't. IB is nice - but could be replaced with a (large) handful of IEEE1394 links (possibly with an iWARP implementation), in most cases, IMHO. Well, it does have a lead on latency... Otherwise, I agree completely. Mainframes were never priced competitively.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
  27. 4core = CISC+3*RISC by Anonymous Coward · · Score: 0

    For compatibility with BOTH the future+past: Multicore == 1 Legacy Core + 2^n-1 AMD64 Cores (64bit *only*)

    Timing would have been perfect: the multicore transition took place right after AMD64. Noone would be hurt by having only one 32bit core. All parallel code would be 64bit.

    Alas, Fear rules management.

    So now we await the first brave (=least fearful) soul to produce 64bit only CPUs. It would do 32 bit by emulation, no big deal. /jm

  28. "Expensive, closed" != RISC by Snorbert+Xangox · · Score: 1

    What I find weird here is that this is being construed as "woo, Intel takes on RISC", whereas the actual situation is "woo, commodity microprocessors can now take on the low-volume, high-margin, high-availability big business end of the computer market". RISC has nothing to do with it - in an alternate universe*, it could have been VAXes running Ultrix that Intel was going up against, and the language would be completely identical. The big deal is that Intel Xeons can now go into systems that compete on high-end features with large, enterprise SPARC and Power systems, and just as importantly, that you can run workloads on the Xeons that you used to run on SPARC or Power systems. This is as much about the fact that Xeons can run Linux or Solaris about as well as SPARC or Power can run their respective Unices, and that the software is available across all three platforms. Not to mention, Xeons can now supplant Itaniums, but let's just dance around that subject thanks very much. :-)

    What has happened though, is that in the lazy shorthand of business computing journalism, RISC has become equated with "large SMP machines with lots of HA features produced by vertically integrated companies like IBM, Oracle, HP and Fujitsu." It's a bit like equating V8 with "heavy car with terrible handling and fuel economy" because you happened to be writing about the American car market in the 1950s.

    * a universe in which DEC managed to make VAXes actually go fast somehow

    --
    -Snorbert, somewhere in the antipodes
  29. x86 compatible? by unixisc · · Score: 2

    That, as well as the i860 too (which was even earlier than i960, but used in the Intel Paragon supercomputer). And this new CPU - is it x86 compatible? Or are we about to see a new instruction set?

    Even aside from those, Intel had rights to the DEC Alpha once it made its settlement w/ DEC. That was still #1 in performance when Compaq/HP killed it. If this new CPU is going to be incompatible w/ x86, I don't think it has any more of a future than the Itanic, much less EM64.

    HP was out of its mind to kill PA-RISC for Itanium. Compaq was out of its mind not to aggressively push Alpha in the NT market, and extend OVMS. All RISC vendors - IBM and Oracle - should learn the lessons from Itanium and not let Intel shoot down superior and/or well established RISC platforms like Power or Sparc in favor of something totally new. And does this make Itanium an HP-only CPU, dropping even the Intel backing?

    Also, what exactly is closed RISC architectures today? OpenSparc is available, OpenPower is available, and even MIPS, as much as I understand it, is freely licensed that there are so many organizations using it. With 3 open RISC architectures, why does anyone need another?

    1. Re:x86 compatible? by jwilso91 · · Score: 1

      Fighting Intel is definitely fighting The Man. Back in the 90s I worked with a computer company that developed and marketed their own RISC architecture chip. Intel spent more in R&D annually than our entire company revenues and, shall we say, has a well-funded in-house legal department. Needless to say, their software now runs exclusively on Wintel.

    2. Re:x86 compatible? by unixisc · · Score: 1

      Intergraph?

  30. EPIC RISC game over for Intel by unixisc · · Score: 1

    Both Sparc and Power now have open specifications that anyone can use to implement their own microprocessor and sell it in the market for any targeted applications. Which is pretty much the goal of open standards. The closed RISC standards that were there - Clipper, PA-RISC and Alpha (Alpha actually less so) are all dead, as are i860 and i960.

    Incidentally, the latter Alpha and Power architectures, as well as the MAJC processors all borrowed some VLIW concepts such as concatenating multiple instructions into a single word to enhance their SIMD capabilities, so it's not like VLIW is a complete failure. Itanium managed to, on a PR front, knock down PA-RISC, MIPS and later (after HP bought Compaq) Alpha, but ironically, failed to do much against Am64, w/ the result that it's not made a dent in the marketplace, and Microsoft, Oracle, RedHat and Canonical have all dropped support for it. Even Intel's latest C++ & Fortran compilers don't support Itanium: support is referred to earlier versions. Given that factoid, Intel's announcement reiterating support for Itanium sounds hollow. And w/ the Itanium's list price of $700-$4000, one can't support that CPU even if one wants to.

    The game is over - the only CPUs that matter are x64, Power, Sparc and MIPS (I'm not counting ARM here, since it's so far unsuitable for server apps).Intel can forget about dethroning either IBM or Oracle in that arena.

  31. CISC == variable length instructions by unixisc · · Score: 1

    Variable length instructions are what force the CPU to have microcode, to determine the length of each instruction. That's what makes CISC CISC. Note that RISC doesn't exactly mean reduced #instructions: the instruction set of Power, for instance, is huge, while that of the PDP-11 was very small. What makes a CPU CISC is variable length instructions.

    1. Re:CISC == variable length instructions by Anonymous Coward · · Score: 0

      >Variable length instructions are what force the CPU to have microcode, to determine the length of each instruction. That's what makes CISC CISC. Note that RISC doesn't exactly mean reduced #instructions: the instruction set of Power, for instance, is huge, while that of the PDP-11 was very small. What makes a CPU CISC is variable length instructions.

      I work for Intel designing microprocessors and I assure you that you are exactly wrong. You have causation backwards.

       

  32. Hauppauge 486 + 860 by johu · · Score: 1

    Don't forget Hauppauge i486 motherboard that had i860 on it. Not quite i960, but still RISC. Pretty much only thing you could do with i860 side was running sample application included on floppies that rotated some characters on upper right corner of screen - and that rotation persisted over reboot with ctrl+alt+delete. Whoo, multi-processing! I think i860 processor on that motherboard was intended to be used together with bundled non-standard display adapter for some sort of CAD use.

    I actually had one of those, got it from some bankrupt company with full manuals, compiler for i860 etc. Shame I've lost it over years as I doubt there's many of those left today. Even had that custom display adapter and bunch of technical information from factory as it was some sort of pre-production sample sent to company importing Hauppauge products.

    http://www.geekdot.com/index.php?page=hauppauge-4860

    1. Re:Hauppauge 486 + 860 by Jeremy+Erwin · · Score: 1

      The i860 and the i960 were entirely different chips.

      Famously, the i860 was described as a Cray on a chip

  33. Is the 64-bit mode RISC? by unixisc · · Score: 1

    Talking about just the 64-bit mode, where only the instructions that deal w/ 64-bit arithmetic are involved, are all those instructions of fixed length? In other words, would microcode be needed if one were to run a program that just used 64-bit instructions?

    If it is, then x64 can be called a 64-bit RISC CPU (even while being a 32-bit CISC CPU), at least the 64-bit part of it. But if the ALU instructions that deal w/ 64-bits are variable as well, then AM64 is a 64-bit CISC CPU.

    So which is it?

    1. Re:Is the 64-bit mode RISC? by badkarmadayaccount · · Score: 1

      VLE does not a CISC make - check out PPC Embedded profile.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
  34. 128 bit CPUs? by unixisc · · Score: 1

    Somewhat unrelated, I have a different question from the topic of this thread.

    Are there any 128-bit CPUs? By this, I specifically mean a CPU where the ALU is 128 bits, and one can do 128-bit arithmetic or logical operations? I'm not talking about CPUs w/ 128-bit FPUs either - even that is a totally different animal. I'm specifically asking about 128-bit ALUs in the integer operations part of it.

    I know that the upper limit of a 64-bit CPU makes it unlikely that a 128-bit CPU would be needed for any memory limits. What I do want to know is whether any CPU would work w/ 128-bit numbers in a single instruction cycle.

    1. Re:128 bit CPUs? by Sique · · Score: 1

      GPUs are routinely 256 bit for both integer and floating point instructions. The Cell processor of Sony PS3 fame has seven 128 bit SPEs (Synergistic Processing Elements), which are controlled by a 64 bit PPE (PowerPC Processing Element).

      --
      .sig: Sique *sigh*
  35. NIETHER RISC nor X86 by iamacat · · Score: 0

    Both are based on obsolete ideas. With code size ever increasing faster than L1-3 cache, it's better to have a complex instruction set with more compact encoding of common real-life code. And X86, oh well we all know where and when that came from. It's time for a modern architecture that efficiently supports today software - including JVM and .Net, security and technologies such as OpenCL. True high end servers cost millions, so it's worth developing custom software even for one of them!

  36. ...and z/Architecture by BBCWatcher · · Score: 1
    The IBM System z mainframe CPU is most definitely a "CPU that matters." You just have to respect 5.2 GHz clocked (continuous) cores, and mainframe growth has been huge in recent years. IDC says IBM z now has 9% of the total server hardware market, making it bigger than Sparc, MIPS, and ARM servers combined. I tend to think of IBM z as the Apple Macintosh of servers: once written off prematurely but now widely admired for its innovation/quality and (more importantly) for its rapid marketshare gains.

    Actually, I wouldn't put Sparc and MIPS on that list. ARM is only just starting to get interesting (for servers).

    1. Re:...and z/Architecture by TheRaven64 · · Score: 1

      IBM has cut costs on both the POWER and System/z lines a lot in the last few years by combining the chip development. The POWER6 and z/10 are different chips, but they share a lot of the same functional units (including things like BCD). This means that the System/z hardware people only need to develop things that are specific to the large mainframes, not worry about the complete system design.

      --
      I am TheRaven on Soylent News
  37. Yay for abject lies by Anonymous Coward · · Score: 0

    Sparc is far more open than anything intel has done so far, for one. As is PowerPC.

    Fsck us, we need to get rid of x86 already, not the others.

    1. Re:Yay for abject lies by unixisc · · Score: 1

      I just hope that none of the 64-bit extensions of Am64 is CISC: if that's the case, then future processors that drop 32-bit support can be pure RISC. And that time will come - how many of us today worry about whether win16 apps are supported or not?

    2. Re:Yay for abject lies by m50d · · Score: 1

      Why do you want pure RISC? I'd rather have a more efficient processor than theoretical purity. Even ARM has moved away from pure RISC with Thumb.

      --
      I am trolling
    3. Re:Yay for abject lies by unixisc · · Score: 1

      RISC is more efficient, and the top performers in RISC like Alpha 21364 adapted some VLIW principles, like long instruction words, to enhance performance. Once win64 is well entrenched i.e. most 32-bit apps have moved to 64-bit, they could simply run on a RISC CPU, which would require a lot less circuitry to support legacy x86.

  38. Good news everyone! by Maury+Markowitz · · Score: 1

    " days of IT organizations being forced to deploy expensive, closed RISC architectures for mission-critical applications are nearing an end"

    Indeed, the days of IT organizations being forced to deploy expensive, closed, sorta-RISC is upon us! Happy days!

  39. Bypass the older instruction set? by ResidentSourcerer · · Score: 1

    So can you get better performance with Intel chips by bypassing the old crufty instruction set? If so, then just redoing the system libraries of the OS might make a major difference in overall performance.

    Can a compiler be set to produce 'universal' binaries that can fall back to CISC instructions, but detect and execute faster instructions when available?

    --
    Third Career: Tree Farmer Second Career: Computer Geek First Career: Teacher, Outdoor Instructor, Photographer.
  40. hackers called it in 1995 by Vorpix · · Score: 1

    KATE: RISC architecture is gonna change everything.

    DADE: Yeah. RISC is good.

    --
    frog blast the vent core
  41. RISC was born as RITC by epine · · Score: 1

    "RISC" and "freedom" are two of the most bent out of shape words in the computer science lexicon. When RMS designed "freedom" a new API, he fired off a scripting command to his global botnet s/freedom/free_as_in_beer/gggggggggggg/! but he missed the last "g" and it's been confusion ever since.

    RISC actually meant Reduced Implementation Team Computing. In practice it meant "this is very cool, but we are way behind the big boys, but maybe we can catch up through a policy of extreme simplification clothed in FUD". Hardly anyone names a sexy new technology after a budgetary constraint, so it became known as RISC instead.

    There was about a ten year period where you could do a CPU design on RISC principles for much less than a CISC design, while bragging about superior performance. This was always a bit disingenuous, since CISC chips were designed for the largest (and cheapest) mass production processes, while RISC chips were produced in much smaller lots with entirely different binning triage. Was it really ever the architecture?

    The dirty secret here is that by 1996 the complexity of the execution core was only a small driver in project design cost. Cache architecture, cache coherency, bus protocol were equally or more important, and everyone had an equally complex design: there's no such thing as a RITC cache hierarchy in the performance space. The Pentium Pro was the first Intel chip which really nailed the caching subsystem. You see this when benchmarks hold up really well under load. On a lightly loaded system the Pentium Pro and the Pentium Pah weren't that different. Many were disappointed. But when you started to run a heavily loaded Windows NT, you really noticed a difference.

    Some of the RISC people said about the Pentium Pro split-transaction bus "that's not a real man's bus!" What they meant was "if Intel makes that bus any better, we're doomed!" They all knew their real edge had been won by hard work rather than dumb lingo, despite the mass indirection in the marketing space.

    Much of the performance of Alpha had less to do with architecture and more to do with some very expensive metalization layers which made the architecture possible. Bike frames filled with pressurized helium have not yet made it to Walmart (I'm brave enough to conjecture without clicking through).

    This article is doing its level best to resurrect RISC as a badge of distinction purely as a market agenda. What a crock. I'd rather click through 38 pages of Phoronix.

    Someone could do one of those sarcastic motivation posters titled "RISC" over a picture of a man with elephant balls on a trolly, and the caption underneath: "This is your compiler on Itanium".

  42. It's all about I/O, stupid by ebunga · · Score: 1

    For most server workloads, I/O is more important than raw computing horsepower. Ask anyone that has actually virtualized a few dozen machines, or really, anybody that has been in the field for more than "I JUST DROPPED OUT OF COLLEGE AFTER FAILING DATA MANAGEMENT 101 TIME TO MAKE A STARTUP CENTERED AROUND NEW IMPLEMENTATIONS OF TECHNOLOGIES EVERYONE FOUND TO BE BAD IDEAS IN THE SIXTIES SEVENTIES AND EIGHTIES."

    Note: all caps because eliminating lower case and using a limited character set means the nosql database can store 30% more data in the same amount of memory.

  43. How about software compiled as RISC microcode? by PhunkySchtuff · · Score: 1

    With the P6 onwards, Intel's x86 chips have been pretty well a RISC core wrapped with a powerful fetching and decoding engine that transforms "native" x86 instructions into CPU specific microcode. This decode engine makes some pretty good assumptions about being able to reorder instructions for greater throughput and the like, but it's got me wondering - would it be possible for the CPU's low-level microcode to be exposed as an instruction set and software compiled directly to the low-level RISC-like microcode?

    Would this provide any tangible benefit to execution speeds (being able to skip part of the decode process) or would it allow a compiler to make more educated decisions about instruction reordering and general program flow if it had access to generate microcode instead of x86 instructions?

    Would it be possible to have fat binaries that have x86 instructions and microcode instructions in the same file (fat binaries are possible on many systems, such as OS X where you can have PPC and x86 executable code in the one binary)

  44. FX!32 by badkarmadayaccount · · Score: 1

    I think Intel will be looking around for that Transmeta IP any day, now. And getting into reverse engineering FX!32. Maybe call up their buddy IBM for some source code of a certain bought out z/Arch emulating start-up that Apple licensed at a certain moment in time.

    --
    I know tobacco is bad for you, so I smoke weed with crack.