Intel's RISC-y Business

finally??? by nurb432 · 2011-09-19 10:37 · Score: 2

What the hell was the i960 then? Meatloaf?

--
---- Booth was a patriot ----

Re:finally??? by the+linux+geek · 2011-09-19 10:41 · Score: 3, Insightful

A non-entity outside a few X terminals and RAID controllers.
Re:finally??? by Anonymous Coward · 2011-09-19 10:42 · Score: 2, Funny

What the hell was the i960 then? Meatloaf?
Oh hell no. He'll do anything for love, but being an AMD aficiando, he won't do that.
Re:finally??? by kungfuj35u5 · 2011-09-19 10:45 · Score: 1

Heh slow raid controllers at that
Re:finally??? by crankyspice · 2011-09-19 10:55 · Score: 4, Informative

Intel has had several RISC chips on the market at various times; the i960, the i860, even ARM designs (XScale).
TFA doesn't say Intel is going to be bringing out RISC technology, though, just that it's "taking aim" at markets that are still RISC strongholds:

With the launch of the E7 earlier this year, it seemed Intel was finally ready to make its final push, calling out RISC by name. “The days of IT organizations being forced to deploy expensive, closed RISC architectures for mission-critical applications are nearing an end,” said Kirk Skaugen, vice president and general manager of Intel's Data Center Group, in a statement announcing the E7 line.
Bold words. Can the E7 really dethrone UltraSparc/Power/PA-RISC and, of course, Intel's own Itanium processors? Intel thinks so.

--
geek. lawyer.
Re:finally??? by Jah-Wren+Ryel · 2011-09-19 11:07 · Score: 1

TFA doesn't say Intel is going to be bringing out RISC technology, though, just that it's "taking aim" at markets that are still RISC strongholds:
Yeah, this is more about adding fault tolerance features than it is about anything that would qualify as RISC.

--
When information is power, privacy is freedom.
Re:finally??? by NJRoadfan · 2011-09-19 11:14 · Score: 2

It also powered the HP LaserJet 4 series of printers.
Re:finally??? by Jeremy+Erwin · 2011-09-19 15:58 · Score: 1

I thought the Paragon used the i860-- a different, later chip.

Itaniums is **NOT** RISC by Anonymous Coward · 2011-09-19 10:48 · Score: 2, Interesting

Just have to point out, Itanium is absolutely NOT RISC in any sense of the word. Other than that, it is rather unfortunate that Intel has the most money to develop new processes (i.e. die shrinks), because the actual Intel instruction set is quite inelegant, both from a programmer standpoint, and from the standpoint of implementing it in silicon. I can't argue with overall performance, if Intel tops performance than that is that; but, the fact of the matter is that any of these RISC designs (Power, Sparc, the PA-RISC, Alpha, ARM...) would clean Intel's clock if they had access to the type of processes Intel does.

Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-19 11:07 · Score: 2

the actual Intel instruction set is quite inelegant, both from a programmer standpoint
I've always been curious about this kind of statement. I hear it a lot. While I understand the complexities of silicon implementation (finding instruction lengths and decode are a PITA), I've always thought the ISA itself was rather elegant. Yes, there is cruft that could be dropped and AMD did some of that with X86-64, but overall, the day-to-day instruction set is mostly orthogonal and has a fairly regular encoding. GPR shifts, MUL and DIV are a bit quirky and the lack of a packed 64-bit integer multiply is an almost unforgivable sin, but overall, I rather like it.
What are the things you would like to see changed? We need specifics to have an interesting discussion. :)

--
Re:Itaniums is **NOT** RISC by the+linux+geek · 2011-09-19 11:13 · Score: 1

The SSE extensions are ugly, if you're including that in the category of x86.

Lack of FMA support..

Relatively starved for registers, although since it's not a load/store arch (another issue, imho) that matters less than it does in, say, ARM.

There are also implementation issues (lack of a directory cache makes scalability suck), but architecturally, it's a pretty standard and slightly boring CISC. I don't quite understand all the hate it gets - it does tend to be slower than Power or z, and doesn't scale well, but the problems are implementation problems, not architectural ones.
Re:Itaniums is **NOT** RISC by the+linux+geek · 2011-09-19 11:17 · Score: 1

Variable-length instructions are also kind of annoying. (Yes, replying to myself is bad form)
Re:Itaniums is **NOT** RISC by FrankSchwab · 2011-09-19 11:29 · Score: 1

Why?
When transistors were expensive, fixed-length instructions made some sense on die (although they tend to inflate system memory needs), but transistors are extraordinarily cheap today. Instruction decode is such a small part of a modern processor die, and so fast, that it makes no difference.
Sure, the world would be aesthetically more appealing if the 68000 had won the microprocessor war rather than the 8086, but the performance difference at this stage of evolution would be infinitesimal.

--
And the worms ate into his brain.
Re:Itaniums is **NOT** RISC by JamesP · 2011-09-19 12:19 · Score: 1

The SSE extensions are ugly, if you're including that in the category of x86.

Why? x87 is definitely ugly, but sse?

Lack of FMA support..

Like this? http://en.wikipedia.org/wiki/FMA_instruction_set

Relatively starved for registers, although since it's not a load/store arch (another issue, imho) that matters less than it does in, say, ARM.

x86-64 improves on this

There are also implementation issues (lack of a directory cache makes scalability suck), but architecturally, it's a pretty standard and slightly boring CISC. I don't quite understand all the hate it gets - it does tend to be slower than Power or z, and doesn't scale well, but the problems are implementation problems, not architectural ones.
Problem is Intel has a lot of money. So even if Power or Alpha is 'better', Intel has the money to make it better (in general) than the competition (see Apple dropping the PPC because IBM couldn't make a mobile G5, amongst other things)

--
how long until /. fixes commenting on Chrome?
Re:Itaniums is **NOT** RISC by loufoque · 2011-09-19 13:01 · Score: 1

I manage a high-performance library that contains, among others, a SIMD abstraction layer, not unlike Framewave or Accelerate (but better, of course ;))
The SSE/AVX variants are clearly the most annoying to support, and are not really orthogonal at all.
The PowerPC and NEON variants have much more straightforward implementations.
Re:Itaniums is **NOT** RISC by the+linux+geek · 2011-09-19 13:02 · Score: 1

Did you read your own Wikipedia article? FMA isn't in any shipping Intel x86 CPU.
Re:Itaniums is **NOT** RISC by hedwards · 2011-09-19 13:03 · Score: 3, Insightful

As far as x86-64 goes, isn't that mainly because AMD trotted out a 64bit processor that was backwards compatible with 32bit programs and whomped Intel's 64bit processors which required specially compiled programs to work with?
Re:Itaniums is **NOT** RISC by Anarke_Incarnate · 2011-09-19 13:42 · Score: 1

Yes. Intel wanted the MERCED to trickle down and replace the aging x86. They STILL refuse to call it AMD64, which is what AMD calls the architecture (This caused confusion at my job, because people assumed AMD64 was only for AMD CPUs and the servers they were downloading code for were intel based). Intel instead calls their version EM64T, which is based on, but a lesser variant of, AMD64.
Re:Itaniums is **NOT** RISC by ultranova · 2011-09-19 14:29 · Score: 1

Relatively starved for registers, although since it's not a load/store arch (another issue, imho) that matters less than it does in, say, ARM.

One might argue that the whole concept of (general) registers is an ugly hack to get around limited or nonexistent cache controllers in old processors. It certainly isn't "elegant" by any stretch of imagination to divide general storage into two separate namespaces, and it also wastes memory with what are basically explicit cache control commands (load/store).
Also, don't forget that the more registers you have, the more state the OS has to save and restore at task switch time.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Itaniums is **NOT** RISC by FreonTrip · 2011-09-19 15:19 · Score: 2

I think - in a colossal effort to refuse to acknowledge that they're eating their competitor's dog food - Intel changed from the awkward and ungainly EM64T to Intel 64 for nomenclature. The only differences between the two amount to a tiny number of instructions AMD deprecated, then inexplicably brought back after Intel had implemented the rest.
Re:Itaniums is **NOT** RISC by Darinbob · 2011-09-19 15:24 · Score: 5, Informative

The x86 architecture is horribly unorthogonal. Each register in the basic set has it's own special purpose which are required by some instruction or other, thus no register is general purpose. The instruction set is clearly CISC with variable instruction size, multiple ways to do the same operation, etc. So many instructions operate directly on memory instead of being a load-store architecture with a lot of registers. It was designed to not take up a lot of program space as opposed to being efficient to decode and execute. It's really not that elegant compared to even other CISC chips of it's era (68000 for example).
Ie, you've got the EAX "accumulator", EBX base register, ECX counter register, EDX for division, SI source index, DI destination index, etc. The closest to a general purpose data register is EAX, and EBX is sort of like a general purpose address register, but there aren't any pure general purpose registers that can be used for anything. And so your programs tend to spend a lot of time shuffling stuff into the register that's needed or using a memory location directly as an operand.
But that make sense since the x86 instruction set was more an evolution than a design. Start with 4004 (first microprocessor), go to 4040, 8008, 8080, 8085, then finally 8086. Along the way every new CPU was vaguely compatible (either very similar instructions, or you could write a program to convert existing code to the new CPU). Along that evolution the instruction set grew. It was important in the 8080 era to save program space since RAM was expensive. Without a cache it meant that instruction fetching was just as expensive as fetching a memory operand. The more complex instruction sets meant that most CPUs along this line were microcoded, but the performance hit from that wasn't so big since most of these early chips weren't meant to be speed demons but were for low cost designs (low cost relative to the big computers anyway). Microcode meant you could add a new instruction easily without a lot of design overhead.
The snag is that along the way RAM got cheaper and the need for performance become the key feature. But Intel adapted because in the Pentium and later these chips really are RISC under the hood. They convert the x86 instructions on the fly into a something that's a step up from microcode which are much more suitable for a pipelined or superscalar architecture. So basically everyone uses RISC these days, it would be foolish not to. But Intel is a prisoner of it's own design. It can't change the instruction set without breaking compatibility. Every time it has a better architecture it's a flop because that's not PC compatible and they're competing with others for the same product space.
Re:Itaniums is **NOT** RISC by Darinbob · 2011-09-19 15:29 · Score: 1

There are new processors without cache too. RISC isn't just for high end systems. Most of the lower power chips for embedded market are RISC based, and this includes a wide variety of ARM CPUs. Even when you do have a cache you are often at the range of power where you don't want a very complicated instruction decoder because you're not building a top of the line PC. The point of RISC is to keep the entire machine design simple and straight forward and uniform, not just instruction decoding; the more space you save the more you can use for something that really does help your performance (bigger cache, more ALUs).
Re:Itaniums is **NOT** RISC by crispytwo · 2011-09-19 18:53 · Score: 1

I was always under the impression that the 68k vs 8086 architecture produced far less heat for the same throughput.
If that was true then, and is still true, then current processors could be consuming less power under a different architecture and doing the same work. Given that my cell phone's ARM chip is more powerful than my old PC, and heats up far less no matter how much I gab on it might give some credence to the concept.
Re:Itaniums is **NOT** RISC by serviscope_minor · 2011-09-19 21:03 · Score: 1

x87 is definitely ugly
x86 with it's stack based apparoach is certainly ugly. But (and here's a big but) it works internally at 80 bit for free which was fantastic. With careful coding on could write very effective and accurate single precision floating point code (or get better precision with doubles) essentially at no cost. It also supported loading and saving to memory of long-doubles so one could have hardware assisted super precision floating point numbers if needed.
That was all very nice, but required some care.
Of course if you overflowed the stack, it pushed the end into memory, and would usually truncate. This would often be different between -O0 and -O3. It also could it very hard to estimate the real precision of difficult foating point code.

--
SJW n. One who posts facts.
Re:Itaniums is **NOT** RISC by serviscope_minor · 2011-09-19 21:16 · Score: 1

One might argue that the whole concept of (general) registers is an ugly hack to get around limited or nonexistent cache controllers in old processors.
One might, but one could also argue that they're just another part of the memory heirachy (registers/L1/L2/L3/RAM/disk/stone-tablets). Registers usually require fewer cycles to access than even L1 cache, and can also do several more fast things, like parallel access of the two operands of some opcode and read-modify-write in a single cycle.
Of course, many processors blur the distinction somewhat.
I agree with your point about the lack of elegance of separating the namespaces. However, most processors in hardware effectively present two namespaces (or rather two different views on the same namespace) as the stack and no-stack memory. Registers simply add a third to it.
Another advantage of separating namespaces is that there is no need to make the non-shared parts coherent in multi-CPU systems. Although, taking this too far (e.g. in the Cell) makes life very much harder.
Anyway, it's fun to debate the philosophy of CPU design :)

--
SJW n. One who posts facts.
Re:Itaniums is **NOT** RISC by TheRaven64 · 2011-09-19 22:39 · Score: 1

They STILL refuse to call it AMD64, which is what AMD calls the architecture
AMD called it x86-64. People called it AMD64 because IA64 was used for Itanium. AMD64 is misleading, since x86-64 is a relatively small set of tweaks to x86, yet it gives all of the credit (or, perhaps, blame) to AMD. Calling it x86-64 is vendor neutral and descriptive.

--
I am TheRaven on Soylent News
Re:Itaniums is **NOT** RISC by Renegrade · 2011-09-20 01:35 · Score: 1

I'd like to add to your comment that the x86 front end, although hideously ugly compared to say, the 68k mentioned above, acts basically as an instruction compression engine.
So you have all the advantages of dense CISC-y instructions with a powerful RISC engine under the hood. Memory is still expensive and very small --> is a cache huge? is it cheap? No, and no. CISC-style instructions pack more easily into those tiny spaces, making cache misses less often and less expensive.
RISC didn't win. CISC didn't win. They both lost out to designs that can leverage the advantages of both.
Re:Itaniums is **NOT** RISC by Targon · 2011-09-20 01:40 · Score: 1

The thing is, times have changed, and you have to look back at the real-world issues, not just at low level "small" applications. The more complex things become, the less the CISC vs. RISC argument matters, especially when internally, CISC instructions get broken down into RISC-type instructions anyway.
So, if you are doing something really complex, a well-written application done with CISC instructions won't be any better or worse than if you did the same thing under RISC. It is like the old idea that it is far easier to have a chip with a VERY VERY high clock speed that executes a lot of NOP instructions than one that is actually doing something, and the more complicated applications become, the more you can benefit from CISC(single command to do the job of multiple commands). I am not including all the SSE instructions since they really were put in place by Intel just to try to shut AMD out for the most part.
Re:Itaniums is **NOT** RISC by Renegrade · 2011-09-20 01:49 · Score: 1

Actually that's 4G address space in the original 68000.
The address registers were fully populated with 32 bits with the very first 68k. Only 24 address lines were actually connected (er, 23, was something odd with the odd addresses if I recall correctly), or 20 address lines in the 68008. Motorola (and Commodore, but NOT Apple) documentation said not to use the upper 8 bits of the address registers as they would one day be connected to address lines.
Lo and behold, the 68020 came out, and it had a full 32 address lines. Commodore's 32-bit clean code was validated, and Apple had to rush to fix code where they were using those "extra" bits as flags.
Also, the 68000, although only possessing a 16-bit-at-a-time ALU and 16 data lines, is effectively a full 32-bit architecture, just a bit pokey. It's lack of 32bit x 32bit = 64bit multiply was pointed out repeatedly by 386 programmers, but by and large, most high level programming languages even today don't support that. (usually they're limited to 32x32=32 or 64x64=64). Since it could do pretty much any 32 (op) 32 = 32 operation, you could write your high level code, and then expect it to be twice as fast on a 68020.
IBM should have used at least the 68008. It wasn't much bigger than an 8088 (used in the IBM PC and XT), being only a 44-pin DIP (vs 40-pin), and had full 68k functionality. The PC-AT could have then used the full-on 68000 instead of the 80286.
Re:Itaniums is **NOT** RISC by JamesP · 2011-09-20 02:06 · Score: 1

Yes, I read it, I was just pointing out it's going to be there (hopefully)

--
how long until /. fixes commenting on Chrome?
Re:Itaniums is **NOT** RISC by TheLink · 2011-09-20 02:51 · Score: 1

Variable-length instructions are also kind of annoying.
Annoying to some, but useful in practice:
http://en.wikipedia.org/wiki/ARM_architecture#Thumb-2
--
- Too many replies beneath your current threshold
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 09:58 · Score: 1

Instruction decode is such a small part of a modern processor die, and so fast, that it makes no difference.
But it is a quite substantial part of the power budget for x86 chips, which is why I stipulated the hardware complexities.

--
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 10:11 · Score: 1

This is a consequence of having to abuse the hell out of prefix bytes in order to extend the ISA in ways it was never originally designed for
It's true that there are lots of prefixes, but if you look at how those prefixes are actually used, there is a great amount of regularity. Almost every SSE/SSE2 instruction uses the same prefix encoding scheme based on whether it is scalar/packed or single/double. SSE3 has regularity across other dimensions. Later SSE ISAs have somewhat less regularity but they were also much smaller extensions.

There have been papers showing that the optimal number of GPRs for an OOO CPU with renaming is somewhere between 16 and 32
I remember reading that paper. I didn't buy it then and after almost 10 years in the HPC market I really don't buy it. Many of those limit-type papers have a fundamental flaw: they assume compilers are really, really stupid. I work on compilers that have to go out of the way to not perform certain transformations because they create too much register pressure. Now, isa-wise it gets harder to make more registers available while at the same time keeping text size reasonable but it's absolutely not true that we cannot use more than 32 registers. We can in fact use thousands, in almost every program.

--
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 12:22 · Score: 1

The SSE extensions are ugly, if you're including that in the category of x86.
In what way are they ugly? To me they are "ugly" in the sense that it's not a general vector ISA but that is not what Intel was aiming for initially. Even AVX and the stuff pitched for Larrabee is not a great vector ISA. But SSE is reasonably functional and you can do quite a lot with it. I guess I am looking for specifics to better understand what I'm missing. :)

Lack of FMA support
Sure. I could name all sorts of things I would like to see in an ISA. But does that make it ugly, or just incomplete? I think you can have a beautiful ISA that is not complete.

It does tend to be slower than Power or z
Really? I have never heard that before and it doesn't line up with my experience. Not saying you're wrong but I would be very interested in reading studies that demonstrate this.

doesn't scale well
What do you mean by "scale?" Supercomputers with hundreds of thousands of cores have been built out of x86 chips.

--
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 12:33 · Score: 1

I didn't read it in depth but what I remember from it was that it was a knee-of-the-curve result, not something where the paper authors thought there was 0 benefit to >32
Yes, that's what they argued, but frankly, it's not a valid result when gcc is your compiler. Wall wrote a very interesting paper on how to use 1000 registers. Compilers today don't even have to come close to any of the fancy tricks he talks about to suck up register resources. :)

but more importantly, it has implementation costs too. If you make a physical register file too large it will cease to perform like a register file.
We already have register files with hundreds of registers. They are used for O-O-O processing. They simply aren't ISA visible. Yes, there is a hardware limit, but even that is larger than people think it is. [Note: almost-shameless plug!] Techniques like register caching can be very effective, allowing very large register files with essentially the same performance as a small register file. Now, with every other architecture research study, take it with a very large grain of salt. But it is an interesting idea. It seems to me that ISA encoding is really the bigger problem.

There's also the concern of needing to save & restore too much state to make a context switch.
Yep, that is a big problem that most people ignore. There certainly is a balance to be struck. In many of the codes I see, 90% of the time is spent in inner loops with no calls, so this isn't generally a problem for those programs. OS effects are usually pretty minimal, but again that's HPC which is certainly quite different from a more general-purpose machine. As with any statement, evaluate it in the context provided. :)

--
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 12:39 · Score: 1

the latest Core are not that different conceptually from the Pentium Pro, over 15 years old now.
I'm not sure what you're getting at here. The x86 ISA has little, really nothing, to do with this. Most of the stuff in Pentium Pro was invented for big iron machines of the '60's and '70's. There's some novel stuff in there but most of it is riffs on a 40-year-old theme. That's true of basically every mainstream general-purpose processor out there.

I believe that the real test for x86 will be when Intel can no more come with a new process shrink every 2 years. This might be around 2018.
That's going to affect everyone, not just Intel. I think you're a bit pessimistic with 2018, but it is certainly coming.

--
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 12:52 · Score: 1

Can you say more about this? What do you mean by "orthogonal" I certainly agree that SSE/AVX leaves a lot to be desired, but so do Altivec and NEON. None of them is a very good vector ISA. In what ways do you see Altivec and NEON as better designs? I am genuinely curious!

--
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 12:56 · Score: 1

Each register in the basic set has it's own special purpose which are required by some instruction or other, thus no register is general purpose.

I strongly disagree with this. There is a small number of instructions (like 3) that are regularly used that have "special" register operands. Otherwise, the only dedicated registers are rsp and rbp and usually you don't even need rbp and even that is set by the ABI, not the ISA (other than push/pop I suppose). I see codes all the time that use every single GPR other than rsp as a general purpose register.

--
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 13:02 · Score: 1

I think it has something to do with the ugly warts that the entire line inherited from the original 8086/8088 days...
Everything in that paragraph is truly ugly. It is also totally irrelevant today. Either no one uses them or they are gone in x86-64.

--
Re:Itaniums is **NOT** RISC by Darinbob · 2011-09-20 13:50 · Score: 1

I think part of the problem is that both 386 expanded things a bit more than 8086 and 80286 did, so it is a bit more uniform. The bigger confusion is that what appears uniform at the programmer level is not uniform at the instruction level. That is, many of these instructions have a "short form" if you use a specific register (ie, ADD an immediate to AX). That's added complexity to the compiler and makes it harder to just use any available register if you also want efficient code.
Similarly, if you're stuck using a specific register with the DIV instruction that can conflict with a compiler optimizer as well because now there's a fixed use register mucking things up. Even if there are only a few instructions that do this it can have a big impact. (though multiply/divide tend to be the annoying cases even in RISC machines).
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 15:21 · Score: 1

That is, many of these instructions have a "short form" if you use a specific register (ie, ADD an immediate to AX). That's added complexity to the compiler and makes it harder to just use any available register if you also want efficient code.
That's not very difficult to handle in a compiler. It's pretty easy to tweak register assignment heuristics to prefer one register over another. Is it worth it? I think the jury's out on that. The text space savings can sometimes make a big difference.

Similarly, if you're stuck using a specific register with the DIV instruction that can conflict with a compiler optimizer as well because now there's a fixed use register mucking things up.
Again, this is easily handled in the compiler and I did admit moderately-used instructions like this are a bit ugly. So you'll get no disagreement from me. In the end, though, it doesn't really make code generators any more difficult.

--
Re:Itaniums is **NOT** RISC by loufoque · 2011-09-20 21:16 · Score: 1

A lot of instructions on SSE are not really natural element-wise or reduction operations, but often affect only the low/high elements, or the low/high bits. The operations on integers are not consistent: sometimes they're only available for 8-bit, sometimes only for 16-bit or only for 32-bit. 16-bit multiplication is in SSE2 for example, but 32-bit multiplication is only in SSE4.1 and 8-bit and 64-bit multiplication still aren't available.
Altivec is more consistent: operations on integers are typically available for all integer sizes.
Re:Itaniums is **NOT** RISC by loufoque · 2011-09-20 21:18 · Score: 1

A recent nonsense I ran into is also _mm256_testz_ps. It's not consistent with _mm256_testz_si256, and doesn't even behave like the Intel documentation says (it only checks the high bit, not the whole value)
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 23:41 · Score: 1

A lot of instructions on SSE are not really natural element-wise or reduction operations, but often affect only the low/high elements, or the low/high bits.
To clarify, you're talking about things like HADD and, with AVX, shuffles that only operate within 128-bit clusters? This is certainly driven by implementation challenges. In the old vector machines these were known as "cross-pipe" operations. You basically end up building a crossbar to implement reduction-type operations (pure reductions, compresses, snake shifts, etc.), So while I agree that these types of operations are very useful, they are also very expensive. SSE's lack of reduction-type operations is one of the major reasons I consider it far from a great vector ISA. So we're in agreement here.

The operations on integers are not consistent: sometimes they're only available for 8-bit, sometimes only for 16-bit or only for 32-bit. 16-bit multiplication is in SSE2 for example, but 32-bit multiplication is only in SSE4.1 and 8-bit and 64-bit multiplication still aren't available.
To be fair, I did say the lack of 64-bit multiply is an almost unforgivable sin. :) But yes, the integer operations are somewhat lacking. That said, how important are they? I am not a graphics expert but I would think the SSE contains the most important operations for graphics. That's what it was originally designed for, after all. In the HPC/scientific codes realm, anything less than 32-bit integers isn't terribly interesting.

--
Re:Itaniums is **NOT** RISC by David+Greene · 2011-09-20 23:44 · Score: 1

Yep, the test/mask instructions are a mess. Intel botched that big time. The two different mask schemes (sign bit and all-1's elements) are strange. I sort of understand why they did it, as an all-1's mask makes it easier to use bitwise operations to simulate predication, but who actually does it that way? The Larrabee proposal cleaned that up somewhat but it still wasn't quite what I'd want to see.

--
Re:Itaniums is **NOT** RISC by loufoque · 2011-09-21 00:52 · Score: 1

Some image processing algorithms I've worked with only work with integers because of numerical stability issues with floats (but then, with work, it would probably be possible to adapt them)
Re:Itaniums is **NOT** RISC by loufoque · 2011-09-21 01:00 · Score: 1

We use blend (or bitwise tricks before that) and vectors full of 0's or 1's for pseudo-branching, not those instructions. The test/mask instructions are only used to return whether a vector contains at least a non-zero element or stuff like that, which is rarely useful.

RISCy Business Cycles by Anonymous Coward · 2011-09-19 10:50 · Score: 1

RISC dominates servers and high end workstations
CISC takes desktops and makes steady inroads into workstations
RISC dominates low power devices
CISC takes high end servers
RISC makes inroads into notebooks and desktops

Lather, rinse, repeat, profit? and yawn!

Probably a bullshit story by oldhack · 2011-09-19 10:54 · Score: 1

The summary stinks of spam with content-free verbiage.

--
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.

Re:Probably a bullshit story by the+linux+geek · 2011-09-19 10:58 · Score: 1

It's just yet another attempt of Intel to make x86 chips take over the high-end server market, as they've been trying to do since the early or mid 90's. x86 is like fusion power in that regard - it's always just a few years from evicting the RISC and mainframe architectures from their niches, no matter when you ask.
Re:Probably a bullshit story by PCM2 · 2011-09-19 11:43 · Score: 1

it's always just a few years from evicting the RISC and mainframe architectures from their niches, no matter when you ask.
I think it's pretty damn close to evicting RISC today -- or at least, putting it into a niche, when I'd hardly have called RISC/Unix a "niche market" ten or more years ago. Mainframes are definitely a niche, but where they exist they are well entrenched.

--
Breakfast served all day!

Re:RISC? by Relic+of+the+Future · 2011-09-19 10:59 · Score: 1

Because it's EPIC. I guess you could argue whether having multiple fixed-length instructions is "different enough" to justify calling it something different, but Intel's marketers (and at least some of their engineers) thought so.

--
Those who fail to understand communication protocols, are doomed to repeat them over port 80.

VLIW != RISC by gman003 · 2011-09-19 10:59 · Score: 1

Itanium is not RISC in any sense of the word. It's pretty much the exact opposite of RISC - instead of using small, simple operations, it uses massive, complex instructions, often ones that produce multiple effects (most words produce three logical instructions).

(Note for the acronym-deficient: RISC == "Reduced Instruction Set Computing", VLIW == "Very Long Instruction Word")

Re:VLIW != RISC by loufoque · 2011-09-19 23:01 · Score: 1

a VLIW does multiple instructions in parallel, but each of these are usually pretty small and simple.
Re:VLIW != RISC by unixisc · 2011-09-20 19:32 · Score: 1

In VLIW, like RISC, the instructions are fixed length. What makes VLIW different is that a lot of dynamic analysis that's done in the silicon for RISC - branch prediction, speculative execution and so on - is done in the compiler. EPIC comes somewhere in between - using flags to indicate the dependency b/w instructions, and executing accordingly. Yeah, RISC do depend on a lot of compiler optimizations, since their software is more often then not written in high level languages, but still, RISC doesn't come even close to VLIW when it comes to dependence on the compiler for certain functionality.
The only way the two are even close is that they are not CISC, and don't require microcode.

Re:RISC? by Panaflex · 2011-09-19 11:00 · Score: 2

It's definitely a RISC processor set... the problem with the Itanium was the EPIC instruction set. A complete waste of time, as the compiler is asked to generalize decisions about the thread and multi-core state of the machine during program compilation.

I mean... who the hell thought that was a good idea? It makes for a nice benchmark, but a terrible architecture. Bring us back the Alpha chip... make it a 64 core monster.

--
I said no... but I missed and it came out yes.

Re:RISC? by the+linux+geek · 2011-09-19 11:01 · Score: 1

I'd say it qualifies as both RISC and EPIC/VLIW. It fits both categories. They aren't mutually exclusive.

Intel not going after RISC? by PCM2 · 2011-09-19 11:18 · Score: 2

Ehhh? The summary seems a little cockeyed. Does anyone on /. really believe this is the first time Intel is using "the R-word'? Intel has been positioning its chips against RISC for ages. Yes, in the past it was using Itanium as its "high end" chip, because it was more directly competitive with IBM's and Sun's offerings (and it probably had bigger margins). But here's an article from 2004 which claims "Intel markets the [Itanium] chip as a replacement for RISC processors from companies like Sun and IBM" -- pretty much exactly what the summary is claiming is "a first" here.

If anything, Intel has chosen not to throw around a lot of rhetoric about x86/x64 as a replacement for RISC servers out of deference to its partners. Back in 2007, you will recall, Sun started marketing x86 servers in addition to its RISC product line. How would it look if Intel went around claiming x86 was a replacement for Sparc servers? Intel left it to Sun's marketing to clarify where it saw its x86-based products in comparison to Sparc. Similarly, around the same time HP was putting out x86 and Itanium servers -- Intel wasn't going to muddy the waters there, certainly.

On the other hand, Red Hat and Dell would certainly talk about Linux servers (read: x86) as replacements for proprietary Unix servers (read: RISC). So it's certainly not like this is the first time anyone floated the idea, and it's certainly not like Intel has backed off from competing with RISC at any point in the past, no matter which component gets positioned against RISC chips.

--
Breakfast served all day!

Re:Are they also gonna shut down the gibson? by crankyspice · 2011-09-19 11:28 · Score: 3, Funny

RISC architecture is gonna change everything!

I'm still waiting for the P6 chip. Triple the speed of the Pentium. With a PCI bus, too.

--
geek. lawyer.

Hmmm... by fuzzyfuzzyfungus · 2011-09-19 11:29 · Score: 2

I'd say that Intel is playing pure weasel-words with their "expensive, closed, RISC" line...

Are most of the Big Serious Iron RISC/*NIXes available from only a single vendor, often one with rather predatory pricing philosophies? Yeah, arguably so.

However, x86-with-Serious-RISC-level-RAS-features isn't exactly a vibrant competitive market... It's pretty much Intel and, um, *crickets*...

The low end of x86 actually has a number of weirdo 3rd parties, in addition to the big two, the middle of the market is a duopoly, but a pretty feisty one; but x86 high enough to compete with the classical serious RISC stuff on its own ground(as opposed to on the grounds of architectural changes that favor big clusters of expendable servers) is basically a single-shop thing. AMD has some pretty decent x86 servers; but Intel is the one bringing the itanium RAS stuff down to their Xeons.

Arguably, the lower end of RISC is substantially more competitive than that of x86: there are some huge number of ARM licencees, a whole bunch of random MIPS stuff floating around, and so forth. Only the middle-performance area, which is an effective duopoly(VIA? right...), but a pretty cutthroat one, where most people find their price/performance sweet spot, really makes x86 look like a competitive market at all...

Re:Hmmm... by bws111 · 2011-09-19 11:58 · Score: 1

Who in this day and age has predatory pricing?
Re:Hmmm... by Shinobi · 2011-09-19 17:46 · Score: 1

IBM... IBM.... Oh and IBM....
Too bad that their top-end equipment is rather nice....
Re:Hmmm... by BBCWatcher · 2011-09-19 21:34 · Score: 1

I'd say no. IBM isn't gaining its server marketshare with predatory pricing. Yes, their top-end equipment is nice, but IBM has also been cutting their prices regularly. (That's very easy to see in their mainframes, for example, where it's quite transparent.) Predatory pricing means less-than-superior stuff that is priced at superior rates. If I'd vote for anyone fitting that description, I'd vote for Oracle/Sun. Oracle has done nothing but squeeze the remaining Sun customers as hard as possible while doing less than the bare minimum to stay in the server business. It's not pretty. :-(
Re:Hmmm... by Shinobi · 2011-09-20 01:02 · Score: 1

I'd define that more as malign parasite pricing.
IBM is happy to price it high enough to make you feel it in your budget, but not high enough to negate the value of their products to your business.
Re:Hmmm... by bws111 · 2011-09-20 05:24 · Score: 1

IBM stuff is EXPENSIVE. That is the exact opposite of predatory pricing, which is selling something for a very low price in order to drive competition out of business.

WTH? Is this an Intel ad? by Anonymous Coward · 2011-09-19 11:30 · Score: 1

This is hardly the first time intel has used the 'R-word' in marketing of Xeons.... Article brings nothing new to the table, hell this has been the Xeon marketing campaign for a decade...

Re:WTH? Is this an Intel ad? by Ant+P. · 2011-09-20 02:39 · Score: 2

Nah, it's just Intel admitting they lost the mobile market to ARM and the value-for-money market to AMD, so all they have left is the ricer and more-money-than-sense market.

Pay no attention to the man behind the curtain by gstrickler · 2011-09-19 11:30 · Score: 4, Informative

Remember all those slow, complex, cumbersome instructions from the 80x86, they're still around, just moved to microcode while all the simple stuff is implemented using the same techniques pioneered by RISC designers. But since this is a server, you're probably running x64 code, which was designed to be much more RISC like in the first place.

So, I guess the real message is "Replace your non-Intel based RISC systems with Intel based RISC systems. But wait, don't answer yet! As an added bonus, Intel chips have extra hardware added so they can run all your old x86/CISC code too, that way we can pretend they're not RISC systems based on the AMD designed x64 instruction set."

--
make imaginary.friends COUNT=100 VISIBLE=false

Re:Pay no attention to the man behind the curtain by RightSaidFred99 · 2011-09-19 11:53 · Score: 1

probably running x64 code, which was designed to be much more RISC like in the first place.
That doesn't even make sense. You do know that adding more registers doesn't make something "much more RISC like" right?
Re:Pay no attention to the man behind the curtain by gstrickler · 2011-09-19 12:15 · Score: 4, Informative

You do know that x64 has a simplified instruction set, simplified addressing modes, larger registers, a larger logical register file, and a much larger physical register file with register renaming, right?
It still supports the full x86 instruction set when running in "legacy mode", but in "long mode", it only supports a subset of instructions, and supports only 16, 32, and 64 bit registers and operands (no 8 bit support), and standardizes the instruction lengths to provide better memory alignment, and simplified instruction processing. And in either mode, all the instructions are converted to one or more macro/micro-ops before running on the "real" RISC core.
You knew all that, right? Of course you did.

--
make imaginary.friends COUNT=100 VISIBLE=false
Re:Pay no attention to the man behind the curtain by rbarreira · 2011-09-19 12:29 · Score: 1

You do know that x64 has a simplified instruction set (...)
I don't remember hearing about this part... what significant chunk of instructions was removed?

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
Re:Pay no attention to the man behind the curtain by chuckymonkey · 2011-09-19 12:37 · Score: 1

It's not that it was removed, you only use the more complex and crappy legacy stuff when you're not running in legacy x86 mode. So yes the instructions are still there, but if you're running x64 then you're not using them.

--
"Some books contain the machinery required to create and sustain universes."-Tycho
Re:Pay no attention to the man behind the curtain by LWATCDR · 2011-09-19 13:05 · Score: 1

Maybe that should be Intel's "next big thing". A Xeon that just supports the x64 instruction set drop real mode, drop segments, drop 286, drop the I/O instructions and make a pure 64bit ISA.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Pay no attention to the man behind the curtain by bws111 · 2011-09-19 13:08 · Score: 4, Informative

IBM mainframes are z/Architecture machines, and they are certainly not RISC. z/Architecture has about 1000 opcodes, including things like 'Square Root' and 'Perform Cryptographic Operation' and 'Convert Unicode to UTF-8'.
Re:Pay no attention to the man behind the curtain by gstrickler · 2011-09-19 14:34 · Score: 1

Great idea. Make it like a real RISC CPU, without all the x86 backwards compatibility addons. What a concept. Of course, then Intel couldn't claim "it'll run all you legacy software", and they might even have to admit it's a RISC design. And where would that leave them?

--
make imaginary.friends COUNT=100 VISIBLE=false
Re:Pay no attention to the man behind the curtain by rabun_bike · 2011-09-19 14:48 · Score: 1

But, sadly, the logic gates are still taking up space on the chip to support all the "baggage" and anyone who has seen the x86 instruction set knows there is lots of baggage going all the way back to the 8088 with the lovely big-endian data segment implementation. Those historic junk logic gates take up space, create heat, and burn power. Since shrinking chips and increasing Mhz isn't cutting it we went to multi-core. Now we are seeing limitation of multi-core so we bump up the bus speed and add more fast cache. All this juxtaposition eats up power. At some point the path forward will be a to break legacy code. I think we are fast moving towards that possibility with the wide adoption of ARM. If consolation data centers see large energy savings with a true RISC processor the market will move that direction.
Re:Pay no attention to the man behind the curtain by the_humeister · 2011-09-19 15:38 · Score: 1

real mode, I/O instructions, etc. can't possibly take up that much of the transistor budget. Especially not when they can cram several cores + 30 MB of cache on one die.
Re:Pay no attention to the man behind the curtain by RightSaidFred99 · 2011-09-19 16:48 · Score: 1

No, it wouldn't. You don't know what RISC is, it's not about the number of instructions.
Re:Pay no attention to the man behind the curtain by RightSaidFred99 · 2011-09-19 16:55 · Score: 1

Sorry, you're full of it. x64 still has variable length instructions, multiple addressing modes, and complex instructions. The number of instructions is irrelevant and in fact RISC can often have more instructions than CISC.
It's not "much more like RISC" by any reasonable definition, and x86/x64 has been using a "real RISC core" for ages.
Re:Pay no attention to the man behind the curtain by gstrickler · 2011-09-19 17:29 · Score: 2

Actually, while those extra gates do take up die space, they're probably fully power gated, drawing no power and producing no heat when in "long mode". How much die space is probably small, remember a 486 only had around 1M transistors, including it's cache. Even if there are 10M transistors dedicated to maintaining compatibility in a modern CPU, that's ~1% of a modern CPU.
x64 mode already breaks backwards compatibility with quite a bit of x86 code, particularly x86 code that isn't 32-bit code. Anything written before the 386 was introduced wont run under 64-bit mode, almost nothing written before Windows 95 came out will run, and a whole bunch of stuff written before Windows XP came out won't run. There's some newer stuff that won't run, but by the time XP started shipping most software was moving to a 32-bit model, and so will likely run (some may require some minor tweaks and/or a recompile). So, most software written in the last 8-10 years should be ok, but most software written before '95 won't, and between '95 and 2003 it's hit and miss. They could probably save more power and/or get better performance by removing some more instructions and breaking compatibility even more, but it's probably not worth it to most users to have to replace so much software. Deprecating instructions today and removing them 6-10 years from now might be viable, but only if the customers see the benefits (as they are seeing with the move to 64-bit), and I don't see that happening unless ARM starts taking a lot of the server market from Intel.

--
make imaginary.friends COUNT=100 VISIBLE=false
Re:Pay no attention to the man behind the curtain by unixisc · 2011-09-19 18:36 · Score: 1

All recent IBM computers, from what I understand, are based on Power7. Or am I mistaken?
Re:Pay no attention to the man behind the curtain by Kjella · 2011-09-19 19:33 · Score: 1

From what I've gathered they also have a form of "soft depreciation" where obsolete instructions are implemented in microcode, meaning the code still runs but much slower and a smart compiler wouldn't use those instructions anymore. That's pretty effective without breaking compatibility left and right.

--
Live today, because you never know what tomorrow brings
Re:Pay no attention to the man behind the curtain by serviscope_minor · 2011-09-19 21:30 · Score: 1

All recent IBM computers, from what I understand, are based on Power7. Or am I mistaken?
http://en.wikipedia.org/wiki/IBM_z196_(microprocessor)

--
SJW n. One who posts facts.
Re:Pay no attention to the man behind the curtain by blind+biker · 2011-09-19 22:55 · Score: 1

More exactly, 894 opcodes, of which 3/4 are implemented in hardware. That's a bit less than 700 "classic" CISC opcodes.
Those are the figures for the newest z/Architecture CPU, the z10 microporcessor.

--
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Re:Pay no attention to the man behind the curtain by Renegrade · 2011-09-20 01:54 · Score: 1

As far as I know, all instructions are implemented in microcode... aside from in 6502s.
Re:Pay no attention to the man behind the curtain by LWATCDR · 2011-09-20 03:18 · Score: 1

Wasted space is wasted space. Most of that code has been moved into microcode but why even bother with it all? Yes your Xeon will not run DOS apps but who cares.
Where they really need to do this is on the Atom line.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Pay no attention to the man behind the curtain by tlhIngan · 2011-09-20 03:31 · Score: 1

real mode, I/O instructions, etc. can't possibly take up that much of the transistor budget. Especially not when they can cram several cores + 30 MB of cache on one die.
Transistors, no, but die area, yes. Caches consume a huge number of transistors, but relatively small amount of die area for those transistors - 30MB of cache (180M transistors!) may occupy around 50-75% of the available die space. The rest of the transistors are the general logic, where it's the wiring that determines how dense the transistors are.
And the x86 compatibility stuff is known to take up to half of the available space. x86 is terrible logic-wise since instructions are variable sized (which means the instruction fetcher needs to cross cache line boundaries and instructions may cross cache lines), and since it isn't load/store, instructions that reference memory has to decode into several instructions - one or more to calculate memory address (depending on addressing mode), and to do the actual load/store.
So no, the x86 front end doesn't take a lot of transistors, but the ones it does take, do take a lot of space. Space that can be used for more cache or more logic blocks. Or just make a smaller die (which lowers cost when you can shove more onto a wafer).
Re:Pay no attention to the man behind the curtain by Chris+Burke · 2011-09-20 07:31 · Score: 1

Not true. The majority of common instructions are decoded directly by the decoders. Only more complex instructions are implemented in microcode.
Unless you just meant "implemented in microcode" as in "decomposed into micro-ops", which is true; technically even a pure load instruction is decoded into a single load micro-op. But microcode usually means a ROM that is read from as a type of instruction memory to get all the micro-ops that make up one CISC instruction. That's only used for a subset of instructions.

--

The enemies of Democracy are
Re:Pay no attention to the man behind the curtain by badkarmadayaccount · 2011-10-01 00:29 · Score: 1

Actually, it is. It is also about orthogonal semantics - no implicit registers, no perverse addressing modes, not to deep a state tree when executing any single instruction (mostly means keeping memory accesses capped - leads to very neat pipelinening.

--
I know tobacco is bad for you, so I smoke weed with crack.
Re:Pay no attention to the man behind the curtain by badkarmadayaccount · 2011-10-01 00:35 · Score: 1

Decoded instructions (micro-ops) are created by the hardware decoder. Microcode is programmable - does nearly the same thing, handles complex instructions well. A procedure ROM is a whole other thing.

--
I know tobacco is bad for you, so I smoke weed with crack.
Re:Pay no attention to the man behind the curtain by badkarmadayaccount · 2011-10-01 00:41 · Score: 1

Not to mention having multiple unprivileged addressing modes - non-orthogonal, almost any number of address spaces - in a single process. In user mode. Oh, and these are the ones in 64-bit mode - you can mix and match with the two older modes - you even have special instructions for it. 5 or 6 instruction formats. Microcoded or hardware implemented or just plain missing instructions (some). And... You get the idea.

--
I know tobacco is bad for you, so I smoke weed with crack.
Re:Pay no attention to the man behind the curtain by Chris+Burke · 2011-10-03 04:30 · Score: 1

Microcode is implemented as a ROM in x86 processors. There is usually a small amount of programmable ucode-patch memory to allow BIOS to update ucode to fix bugs or work around performance issues. Implementing the entire microcode as a programmable memory would be needlessly wasteful.

--

The enemies of Democracy are

It's just spin by msobkow · 2011-09-19 11:32 · Score: 1

The 64-bit x86 machines have been eating away at IBM's, HP's, and Sun's market share for years. Partnered with a good Linux distribution and VMWare, they're more than capable of taking on "the big boys."

Oracle/Sun has been resting on their laurels for far too long. Time will tell whether Oracle manages to plug the holes in that sinking ship.

HP's Itanium boxen have never had significant market share.

That leaves IBM. And IBM doesn't sell you just a POWER based system -- they sell you the whole suite of applications, support, and data center integration. They maintain their market share by making it EASY for business to buy a SOLUTION instead of a computer.

--
I do not fail; I succeed at finding out what does not work.

Re:It's just spin by the+linux+geek · 2011-09-19 13:05 · Score: 1

HP Integrity's been in second place behind IBM Power and ahead of SPARC for a while.
Re:It's just spin by Lawrence_Bird · 2011-09-19 14:46 · Score: 1

and POWER7 does seem to kick ass too, no?

Why we hate x86 by erice · 2011-09-19 11:34 · Score: 3, Insightful

I've always been curious about this kind of statement. I hear it a lot. While I understand the complexities of silicon implementation (finding instruction lengths and decode are a PITA), I've always thought the ISA itself was rather elegant. Yes, there is cruft that could be dropped and AMD did some of that with X86-64, but overall, the day-to-day instruction set is mostly orthogonal and has a fairly regular encoding. GPR shifts, MUL and DIV are a bit quirky and the lack of a packed 64-bit integer multiply is an almost unforgivable sin, but overall, I rather like it.

What are the things you would like to see changed? We need specifics to have an interesting discussion. :)

Limited number of registers
Instructions that require certain registers or a certain subset of the registers
No three register operations. This impacts pipelining because it is not possible not overwrite one of the source registers.
Variable instruction length makes decode a headache

Lots of really bad stuff that isn't used much by modern code by still must be maintained for compatiblity: segments, 286 protection, IO instructions, etc.

I've wondered sometime what attitudes would be if a more likable contemporary instruction set had won. VAX and 68000, for instance, are much more palatable to program but they have performance flaws that are probably worse than x86.

Re:Why we hate x86 by afidel · 2011-09-19 13:46 · Score: 1

Limited number of registers
X86-64, with register renaming 16 is more than enough. AMD did a lot of research before settling on 16, more added significantly to complexity but on increased average program executing speed by low single digit percentages.

Variable instruction length makes decode a headache
Meh, who cares, the whole decoder stage is a couple percent of the non-cache transistor budget. It mattered more back in the PPro era when it was a significant amount of the budget but today it's peanuts and the more verbose ISA makes better use of cache lines which are a much more limited resource in modern designs.

Lots of really bad stuff that isn't used much by modern code by still must be maintained for compatiblity: segments, 286 protection, IO instructions, etc.
Most of it's effectively gone on x86-64 processors even if it's still there for backwards compatibility, if you're writing modern code it has no effect on you.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Why we hate x86 by serviscope_minor · 2011-09-19 21:24 · Score: 1

Meh, who cares, the whole decoder stage is a couple percent of the non-cache transistor budget.
On the high end, the processor has a massive slew of very fast FPU and integer execution units, and a whole bunch of hardware dedicated to getting the absolute best use out of them possible (the out of order unit). The compute hardware tends to be very well utilised and the flops per Watt are actually rather good for a general purpose CPU. In that case, the decoder has little effect.
On the low end, it is a very different story. In the Atom/high end ARM world, the decoder is a much larger fraction of the budget, and even worse, it is always in use, making it quite power hungry.

--
SJW n. One who posts facts.
Re:Why we hate x86 by TheRaven64 · 2011-09-19 22:42 · Score: 1

AMD did a lot of research before settling on 16, more added significantly to complexity but on increased average program executing speed by low single digit percentages.
This is not constant, it depends a lot on the language. For a more dynamic language, like Lisp or JavaScript, more registers give you a significant benefit. For C, 16 is usually more than enough.

--
I am TheRaven on Soylent News
Re:Why we hate x86 by Agripa · 2011-09-19 22:52 · Score: 1

No three register operations. This impacts pipelining because it is not possible not overwrite one of the source registers.
I wonder about this one. Adding 3 register instruction support also means adding an additional set of read ports to the register file. Is it better to execute more instructions in parallel at a higher clock rate or have 3 register instructions?
Re:Why we hate x86 by bheading · 2011-09-20 06:24 · Score: 1

I am not sure the backwards compatibility argument completely stands up these days.
Back in the Amiga days, when the 68060 came out (we are going back the guts of 15 years here), the new processor dropped a few rarely-used instructions. To compensate, Motorola shipped a small library which allowed the old instructions to be simulated when they were detected via an illegal instruction trap.
By working with OS and compiler vendors, Intel could very easily deprecate and phase out all the old backwards-compatible instructions and addressing modes ahead of time. The only group of customers who would be effected by this would be folks who run old, unpatchable operating systems or software but yet also want to run the latest hardware. It's very hard for me to believe that this group is a significant %, especially not relative to the number of customers who are ready to patch their system and who want the benefits of the faster CPU.
Re:Why we hate x86 by erice · 2011-09-20 12:07 · Score: 1

No three register operations. This impacts pipelining because it is not possible not overwrite one of the source registers.
I wonder about this one. Adding 3 register instruction support also means adding an additional set of read ports to the register file. Is it better to execute more instructions in parallel at a higher clock rate or have 3 register instructions?
Actually, no. The number of read ports is the same. The third register is the destination. The logic required to mitigate contention to the overwritten source register is much greater than than simply decoding a third address. Three register operations easily fit into a 32 bit instruction.
Re:Why we hate x86 by David+Greene · 2011-09-20 12:49 · Score: 1

You make some good points. Let us remember that this is all tradeoffs. Maybe better choices could have been made, but they weren't "dumb" choices, which is what I hear a lot of people say.

Limited number of registers
Given the memory limits at the time, this was a reasonable tradeoff to gain text space. Solved somewhat with x86-64 but I agree it's not enough.

Instructions that require certain registers or a certain subset of the registers
Certainly. Thus my references to shift/DIV/MUL.

No three register operations. This impacts pipelining because it is not possible not overwrite one of the source registers.
Software pipelining? Yes. Fixed with AVX at least for the FP side. Integer instructions are still two-operand but it is less problematic there.

Variable instruction length makes decode a headache
It's also a great way to make the icache efficient. I think this was a good choice.

Lots of really bad stuff that isn't used much by modern code by still must be maintained for compatiblity: segments, 286 protection, IO instructions, etc.
Yep. And no one uses it anymore. AMD eliminated a good deal of it in x86-64.

--

Wow, what a terrible article by Sebastopol · 2011-09-19 11:35 · Score: 1

First off, Intel went RISC in 1995 with the PentiumPro, the ISA is CISC, but the uISA is RISC. (Semantics. Bite me.)

Second, Itanium is VLIW, not RISC.

Third, who cares? Sun and IBM are phoning-it-in with this market, just look at the ISSCC proceedings for the past decade.

I'm surprised Intel is even bothering. Is the market that big? Will it grow their bottom line? Anyone?

--
https://www.accountkiller.com/removal-requested

Re:Wow, what a terrible article by the_humeister · 2011-09-19 15:33 · Score: 1

There was an article over at arstechnica looking into why Itanium is still around. Apparently the Itanium market is worth $4 billion. Not exactly chump change.
Re:Wow, what a terrible article by unixisc · 2011-09-19 18:42 · Score: 1

No, Pentium Pro was very much CISC. As an above poster noted, just having a RISC core doesn't make the overall CPU a RISC CPU. The instructions have to be of fixed length so that microcode doesn't have to decode it into smaller RISCy instructions.

Hard to take the story seriously by sl3xd · 2011-09-19 11:44 · Score: 2, Insightful

We live in a post-RISC world. Nearly every modern processor's "core" use the major innovations of a RISC chip. The size of the instruction set is of little importance; many so-called "RISC" architectures (such as Power) have a larger instruction set than the "CISC" x86_64.

The main issue that spawned the development of RISC (that instruction sets were getting so large and unwieldy that instruction decode would take the lion's share of a die's transistors) turned out to be less of a problem than anticipated. At the time, many CISC chips (VAX in particular) were implementing high-level programming features in the architecture's assembly language.

Nearly all of us have decided that efficient compilers have made a high-level, expressive assembly language unnecessary.

Another factor is that modern processors are superscalar, with multiple execution pipelines per core - one instruction decoder then feeds several pipelines, which further reduces the relative size of the instruction decode.

However, modern chips do implement (at least internally), other "core" ideals of the RISC processor:
- Numerous registers
- Load/Store memory access
- Multi-stage Pipelines
- One instruction per clock tick (ie. keep the complexity of an instruction down to what can execute in one tick - if something takes more than one tick, break it down into smaller pieces).

The one thing that the so-called "RISC" chips have historically been known for is dependability: The machines that use them don't crash. This requires more than just a good CPU: It requires good hardware in general, and a good operating system. The "RISC" vendors - such as Sun (now Oracle), IBM, HP and SGI, control the quality of the entire system - from the electrical components, to the chassis, to the airflow in the chassis. Even the datacenter's abilities (power, cooling capacity, airflow) are specified.

There are a lot of things that go into making a system that's mission-critical, and the CPU is a small part of the equation (and usually is the least troublesome). Putting an CPU on a motherboard doesn't give me guarantees about airflow, power reliability, I/O stability and speed, vibration tolerance, nonblocking I/O, and reliability - to say nothing about core OS stability.

Intel isn't interested in doing anything other than selling chips. Unless Intel is willing to take upon themselves a whole-system approach - covering everything from the chassis, cooling and airflow, power supply, motherboard, and core operating system - they'll never play in the league.

Making a mission-critical system is left to others who use Intel's chips, such as HP's high-end Itanium line, and SGI's Altix and Altix UV systems (using Itanium and x86_64).

--
-- Sometimes you have to turn the lights off in order to see.

Re:Hard to take the story seriously by evilviper · 2011-09-19 14:51 · Score: 2

There are a lot of things that go into making a system that's mission-critical, and the CPU is a small part of the equation (and usually is the least troublesome).
That's not really true. The lack of high-end features in x86 CPUs was the weak link in getting reliable servers for some time. And when those features started being added, they appeared in servers almost immediately. Even now Xeons lag significantly behind proprietary CPUs, and Intel is just once again on a marketing push to claim every incremental improvement suddenly makes them ultra-reliable.
Also, the main place all these features need to be is in the chipsets, which Intel also manufactures.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

Re:RISC? by i.am.delf · 2011-09-19 11:58 · Score: 1

My god could you imagine the heat dissipation of a 64 core alpha processor. I had a desktop with an EV7 in it. That thing was a space heater. I just looked it up. The spec was 125W for that thing.

Re:RISC? by mevets · 2011-09-19 12:17 · Score: 1

not to defend itanium, but by not foisting it on the compiler, you foist it onto an interpreter running on the CPU. Although the interpreter was wasteful enough, it had no opportunity to usefully work around the kind of dependence shown by:
mov xyz, %eax
add %eax, %ebx
sub %ebx, %ecx
or %rcx, %edx
It could only insert bubbles until the each op finished.
That was the crazy solution to the CPU:Memory speed imbalance. Multi core has won the day, but modern high speed processing (eg. GPUs) often use this architecture.

x86 are RISC since P6 by maitas · 2011-09-19 12:29 · Score: 4, Informative

When the PentiumPro came along (the first P6 processor) it used internal RISC architecture, and all Intel x86 cores from that time to today stilldecode the x86 instructions in what intel calls r-ops (risc operations) and then it processes them.

Nevertheless the part where Intel says "The days of IT organizations being forced to deploy expensive, closed RISC architectures" it is a lie. You can get the UltraSPARC-T2 Verilog code to make those chips yourself and hte code is GPL. You can't do that with any Intel processor. So Intel processors are the really "closed" processor. It is true that RISC processor are more expensive, but it has nothing to do with "closed"

Re:RISC? by Waffle+Iron · 2011-09-19 12:41 · Score: 1

Why does it need bubbles? Can't an X86 keep its other ALUs busy simultaneously doing other instructions nearby that sequence using standard register renaming and opcode reordering techniques?

At any rate, from what I've read it's the branch prediction that really bottlenecks performance with today's deep pipelines. The advanced runtime branch prediction in the latest CPUs (which can see and react to the actual data at hand) just plain outperforms static compile-time branch analysis.

Re:Are they also gonna shut down the gibson? by hedwards · 2011-09-19 13:05 · Score: 2

What are you up to with all that power? I hope you're not planning to hack a Gibson...

It's not the CPU, it's the whole product. by HockeyPuck · 2011-09-19 13:08 · Score: 1

Sometimes I need to scale vertically and not horizontally. There are times when you need a single chassis with 200+ cores and 8TB of ram and hundreds of PCIe slots for IO. You can take my pSeries from my cold dead hands.

Intel solutions are getting there with 80 cores and 2TB of RAM.

However, when it comes to moving IO, nothing beats big iron.

Re:It's not the CPU, it's the whole product. by afidel · 2011-09-19 14:19 · Score: 1

Unisys offers 6TB of ram, though still "only" 80 cores. Personally I think you probably need to seriously consider a redesign if you need to go bigger than that, but in the enterprise space that kind of development effort normally costs more than buying a couple million dollar box and the couple hundred thousand a year support contract to go along with it. I guess I'm fortunate in that my biggest workload runs well on a 16 core box with a couple SSD's for the main tables.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:It's not the CPU, it's the whole product. by aztracker1 · 2011-09-19 15:19 · Score: 1

Agreed, I can't think of very many instances where a given type of workload can't be distributed for less outlay of cost over big iron servers. It does depend, but then again, full ACID in database servers isn't usually necessary either.

--
Michael J. Ryan - tracker1.info
Re:It's not the CPU, it's the whole product. by BBCWatcher · 2011-09-19 21:44 · Score: 1

So if one woman can produce one baby in 9 months, does that mean if you assign 9 women to the job you'll get one baby delivered in one month?
There are lots of workloads that are inherently single threaded (and probably always will be). If you've got a bigger, faster, more powerful CPU (or vertically scalable server, which fast shared memory and super fast I/O), that'll be a better fit for those sorts of workloads. IBM zEnterprise mainframes are the preeminent examples of the type, and they're selling extremely well. Different servers for different missions.
Re:It's not the CPU, it's the whole product. by Shinobi · 2011-09-20 01:11 · Score: 1

Another reason the z10 sells well is native BCD calculations, meaning that in some tasks coupled with their massive I/O, they are so much faster than Intel/AMD offerings that you'd need AT LEAST 10-15 times more Intel/AMD hardware, with the requisite floorspace, networking, power cabling, cooling and UPS's for all that to compare merely on the theoretical side. In practice, it can get even worse, since the tasks don't parallellize well.

Re:RISC? by Anarke_Incarnate · 2011-09-19 13:39 · Score: 2

125W is a gaming CPU nowadays.

Re:RISC? by mevets · 2011-09-19 13:45 · Score: 1

The mini example was a set of interlocked instructions, where the source operand of each is dependent upon the previous insn; thus everything is forced to be in-order. Compilers are smart enough not to do this, and the real difference in a 'wide' architecture is that it doesn't insert an interpreter (renaming, stalling, bubbling, etc..). The program ( compiler ) has to know that copying R1 to R2 has an N instruction latency before R2 is valid. If it tries to use R2 earlier, it gets junk.

The x86 trend, since Prescott, has been shorter pipelines + more cores to break the bottleneck.

Re:LEA? by tibit · 2011-09-19 13:49 · Score: 1

Because multiplications by a constant that's but an entry in a list having a couple of powers of two are all the rage these days.

--
A successful API design takes a mixture of software design and pedagogy.

Re:RISC? by 0123456 · 2011-09-19 13:54 · Score: 1

125W is a gaming CPU nowadays.

An i5-2500 at stock speeds takes about 60W at full load.

But yeah, if you buy AMD then all bets are off.

All Intel chips have been RISC for a while now by boddhisatva · 2011-09-19 14:29 · Score: 1

Intel's chips have been running on a RISC core for quite a while. The rest of the CISC instruction set is converted by microcode into RISC instructions. Just noticed the person before me said the same thing.

Yawn.. by bored · 2011-09-19 14:38 · Score: 1

Anyone buying POWER or SPARC is a lost cause anyway. Sure Intel might gain a few sales, but frankly the RISC volumes are pretty small and a huge number of them are "stuck" because they have existing applications that they are unwilling/unable to port to an alternative. Or the IT guys are religious zealots. This is the same reason you find AS400s/i5, Nonstops, OpenVMS, zos, etc machines running in data centers the world over. Its not because those OS's or the hardware actually provide some huge benefit that outweighs the 5x (or more, the sky is the limit in some cases) price difference between them and a basic Intel system. Its because companies have 8 and 9 figure investments in software running on them. They will probably still be in datacenters for decades into the future if IBM/Oracle/HP/etc don't decide to kill them off. They zombie on, as long as the original manufacturer supports them and the perceived/actual cost to port the application out weights the cost of buying a new machine/os every 5 years or so.

Re:Yawn.. by Relayman · 2011-09-19 14:55 · Score: 1

Nothing runs Linux like PowerPC. Nothing can handle virtualization like PowerPC. Intel only dreams of doing what IBM does every day.

--
If I used a sig over again, would anyone notice?
Re:Yawn.. by styrotech · 2011-09-19 20:03 · Score: 1

POWER and PowerPC are two different things.
Maybe you meant "nothing runs Mac OS 9 like PowerPC"? Or "nothing can hold up Steve Jobs plans like PowerPC"?
Re:Yawn.. by Shinobi · 2011-09-20 01:35 · Score: 1

You're just showing how little you know.
When it comes to for example IBM's mainframes, for the jobs where they are used, they massively outperform any Intel/AMD cluster both in raw performance and in operational costs over the years.
Re:Yawn.. by bored · 2011-09-20 02:22 · Score: 1

That is what IBM tells you, try generating your own numbers for once instead of spouting the ones the IBM sales guy tells you.
Sure, some of those machines have very high raw performance numbers... But a very large percentage of the installs actually partition that expensive machine up into a dozen or so smaller system images. Which of course negates a lot of the argument about operational costs because the majority of long term operational costs is related to the number of system images you are maintaining. Sure there are hardware support costs etc, but lots of companies can't even identify the performance bottlenecks in their system. Instead they just buy the latest pitch from $LARGEVENDOR, take their slight performance improvement, then repeat the process in a couple years.
Thats not to say, there aren't customers where the numbers for POWER or whatnot work out in their favor, its simply saying that its a smaller portion of the market every year. I have a POWER system sitting less than 10 feet from me right now. But I also have a quad socket westmere, and both the CPU and IO performance on the westmere is frankly astonishing with our application when compared with the POWER. That said the sweet spot is actually the dual socket setups as they are significantly less expensive, and our application scales well in a cluster.
Re:Yawn.. by Shinobi · 2011-09-20 02:51 · Score: 2

Actually, it is the numbers we generated on our own that I'm running. For the project I worked on, a single loaded mainframe outperformed the Altix, off-the-shelf Dell cluster and a couple of other solutions the client looked at. Hardware support for BCD and the massive external I/O.
As for partitioning, in secure environments, the low overhead and the ease with which you can do it on IBM's mainframe reduces the operational costs.
The biggest operational cost over the years is floorspace+cooling+power, and that's where the real gain in, and that's where my clients really learned the difference. The primary and the backup system, complete with their storage arrays, cost just slightly more than just the primary off-the-shelf Dell system when factoring in the number of spares that have to be running just to keep the primary system operational in case of failures. Add to that the state of immaturity of reliable failover systems in the Linux world and the operational costs skyrocket.
As for Westmere, it has nice performance for FP math or non-BCD integer math, and it has nice I/O to RAM/local devices, but external I/O is.. lackluster compared to what a z10 can do.
My personal workstation is a dual quad-core Xeon with a crapload of RAM, because it fits the tasks I personally work with better than a z10 would, but if I were to actually work fulltime with the sort of stuff my last client uses their systems for, it'd be mainframes all over, because the performance and reliability for those tasks is just unparallelled by anything x86-based.
Re:Yawn.. by bored · 2011-09-20 05:10 · Score: 1

but external I/O is.. lackluster compared to what a z10 can do.
Hardware support for BCD or decimal FP? Because x86 has had hardware BCD support since the 8086, and now you can do BCD with SSE. How may digits are your BCD values?
I'm also curious what your cumulative IOP/GB/sec numbers are..
We are pushing a little over 12GB/sec (yes bytes, and fully 1/4 of that is disk IO) through the PCIe buses on a dual socket westmere (including a fairly large amount of data transformation in memory), and that is the limit of the 4 adapters we have in the machine. There are slots for more, so it might do more. But once the new PCIe 3.0 sandy bridge machines come out we will probably upgrade the adapters, and put more of them in the machine.
This on a machine that costs about 1/2 the cheapest P710 Express configuration. At those prices you can't even begin to touch the big iron even if we have a dozen or so nodes.
Frankly, i've seen a lot of data centers time and time again, some guy who is talking about the IO requirements on his machine discovers when we drop an analyzer in the path that its only doing a few hundred MB/sec aggregate IO. They are transaction limited to disk, or latency limited between cluster nodes, etc..
Re:Yawn.. by greed · 2011-09-20 07:24 · Score: 1

POWER and PowerPC haven't been different since POWER3. POWER2--circa 1993--was the last "true" POWER CPU. All subsequent POWER CPUs have been based on the PowerPC ISA.
http://en.wikipedia.org/wiki/IBM_POWER#POWER3
Re:Yawn.. by styrotech · 2011-09-20 10:09 · Score: 1

Using the ISA is a strange way of defining "not different" in the context of actual hardware that IBM ships and what they're capable of compared to what Intel ships.
That would be kinda like saying an x86_64 Atom is not different from a Xeon 7xxx.
Re:Yawn.. by Relayman · 2011-09-20 13:29 · Score: 1

Let's try this again: Nothing runs Linux like POWER. Nothing can handle virtualization like POWER. Intel only dreams of doing what IBM does every day.

--
If I used a sig over again, would anyone notice?
Re:Yawn.. by Shinobi · 2011-09-20 23:13 · Score: 1

"Hardware support for BCD or decimal FP? Because x86 has had hardware BCD support since the 8086, and now you can do BCD with SSE. How may digits are your BCD values?"
Use of both. And the "hardware support" on x86 for BCD is... slow, takes way more cycles than should be needed. And they are using 8-byte Packed BCD.
Note, I was brought on for a specific niche here, tweaking and tuning the Infiniband setup.
As for GiB/s numbers, depends on the time of day/time of year, 25-30GiB/s to and from the storage array is not unusual. When the project was deployed about halfway, we managed to saturate 8 of the 12 Infiniband links to the storage array during a peak demand, though that was with some of the most intense users having been connected already. The storage array has a pair of RAMSAN 630 devices as a buffer for recent/frequently requested data.
More interesting to mention is the fact that the whole setup serves about 15000 concurrent "terminals"(read, workstations/desktops) nationwide, spread over hundreds of offices, some with gigabit access, some with 100 megabit access, working with statistical data, payroll/budget processing, analysis, forecasting etc, with strict separation of users/privileges, audit trails etc. And of course everything is encrypted by default.
What I mean with lackluster on x86 etc is that I/O is still sequential bus limited, and even with DMA etc, the CPU STILL has to do some of the I/O shuffling gruntwork. On the mainframe, you have channels that can be individual or bonded as per your needs. The mainframe processor just tells a channel processor "here, job to do" and then proceeds with the next bit of processing it has to do.
That also has benefits if you move onto virtualization
Re:Yawn.. by bored · 2011-09-21 04:21 · Score: 1

What I mean with lackluster on x86 etc is that I/O is still sequential bus limited, and even with DMA etc, the CPU STILL has to do some of the I/O shuffling gruntwork.
This discussion has come up in numerous places over the last few years and is basically false. The majority of the modern x86 peripherals have as much if not more of intelligence than channel processors. For example fiber channel and SAS boards from qlogic/emulex/etc have full blown processors on them running firmware that handles all of the fiber channel protocol and a large part of the FCP portions. Leaving the CPU's to do little more than specify via SCSI CDB's and target ids which data blocks get moved where. Once the operation(s) are complete the board interrupts a CPU. These boards maintain all the connections, and keep track of tens of thousands of simultaneous IOs. The CPU usage to transfer 3GB/sec to/from disk in our setup is less than 1% and a large portion of that is our application sending messages to-from the OS. Its the same with inifiniband, as the protocol is handled by the adapter, leaving the CPU to do little more than trigger the remote operations.
Combined with the fact that PCIe now includes peer to peer as part of the standard means that you can actually do IO between devices with out even the memory subsystem getting involved. This is how GPU's are doing SLI.
Anyway, I think the original discussion was more about how intel was intending to displace the RISC vendors, aka the power systems not the mainframes. Either way, I think my original point stands, as i'm betting the system your talking about is well into the 7 figure range, or roughly two orders of magnitude more expensive for what is probably only one order of magnitude faster than a single node in our cluster. As our application has nearly linear scaling for node counts in the few dozen range we are an example of an application that probably gets similar (if not greater) IO and processing performance out of cheap Intel hardware.
BTW: Texas Memory Systems makes some cool stuff, and systems like http://www.fusionio.com/products/iodrive-octal/ do a lot to move cheap intel hardware into places that traditionally required big iron.
Re:Yawn.. by badkarmadayaccount · 2011-10-01 01:48 · Score: 1

Shared memory access speed is still a mainframe stronghold. Though the logical structuring of the channel procs is... more logical, as well. PCIe latency for issuing IO commands cuts into IOPS, throughput is how much you put in, no matter where the die is, and hell, there are just a few standard protocols - integrate in the damn CPU already - or the motherboard. Oh, and FC is expensive, brittle, and doesn't give you anything high-density Ethernet+MPLS won't. IB is nice - but could be replaced with a (large) handful of IEEE1394 links (possibly with an iWARP implementation), in most cases, IMHO. Well, it does have a lead on latency... Otherwise, I agree completely. Mainframes were never priced competitively.

--
I know tobacco is bad for you, so I smoke weed with crack.

Re:real RISC by the_humeister · 2011-09-19 15:36 · Score: 1

x86 isn't RISC if they decode microcode into smaller RISC like operations; an internal RISC. The outside instructions must be RISC; how they pull those off internally is not really part of it. Its a black box.

You do realize that even IBM's POWER chips (the final bastion of "RISC") decode instructions into uops too, right? So, are you willing to concede that POWER isn't RISC?

Re:CLOSED? by staalmannen · 2011-09-19 15:59 · Score: 1

You mean openSPARC ( http://www.opensparc.net/ ) and openRISC ( http://openrisc.net/ ). I thought there was a MIPS and a Power-based open hardware project too but I could not find it right now.

Re:RISC? by Chris+Burke · 2011-09-19 17:07 · Score: 1

The mini example was a set of interlocked instructions, where the source operand of each is dependent upon the previous insn; thus everything is forced to be in-order. Compilers are smart enough not to do this, and the real difference in a 'wide' architecture is that it doesn't insert an interpreter (renaming, stalling, bubbling, etc..). The program ( compiler ) has to know that copying R1 to R2 has an N instruction latency before R2 is valid. If it tries to use R2 earlier, it gets junk.

Yes, that sequence of instructions would have to executed sequentially whether for EPIC, Power, or x86, and compilers for any architecture know that they need to expose the maximum amount of ILP to the processor.

However only compilers for EPIC need to know the latency of every operation, the number of each type of functional unit, and any slot restrictions that may apply, so that the VLIW instructions can be assembled optimally. Because only by doing so can the ILP be exploited. Otherwise, like in the example given, bubbles will occur.

With a re-namer and out-of-order scheduler, as much ILP as the compiler can expose in however many instructions will fit in the processor's window can be exploited, automatically scheduled according to the availability and latency of each functional unit on that particular machine.

The upshot is that the EPIC compiler has to do a lot more work to reach the same level of realised ILP as a non-EPIC compiler. It also has to know much more about the intimate details of the specific CPU being targeted. Meaning the binary will be distinctly sub-optimal for any other CPUs -- as opposed to marginally sub-optimal in the case of non-EPIC compilers. For the example given, if there were earlier or subsequent instructions visible in the window that were independent, then there may not be any bubbles at all.

Those things which were supposed to make the compiler-centric world of EPIC better than other compilers and OoO schedulers, like branch predication which was one of the major touted features of the ISA, ended up not being worth much. Intel's own research showed that this feature was a modest positive gain on finely hand-tuned code, neutral with a very good compiler, and negative with a 'typical' compiler.

Having the compiler have to manually do the work of an OoO scheduler in order to avoid bubbles is not a feature. But I mean it almost sounds like you think stalls are only a consequence of the 'interpreter', and don't occur on an EPIC machine.

--

The enemies of Democracy are

"Expensive, closed" != RISC by Snorbert+Xangox · 2011-09-19 17:14 · Score: 1

What I find weird here is that this is being construed as "woo, Intel takes on RISC", whereas the actual situation is "woo, commodity microprocessors can now take on the low-volume, high-margin, high-availability big business end of the computer market". RISC has nothing to do with it - in an alternate universe*, it could have been VAXes running Ultrix that Intel was going up against, and the language would be completely identical. The big deal is that Intel Xeons can now go into systems that compete on high-end features with large, enterprise SPARC and Power systems, and just as importantly, that you can run workloads on the Xeons that you used to run on SPARC or Power systems. This is as much about the fact that Xeons can run Linux or Solaris about as well as SPARC or Power can run their respective Unices, and that the software is available across all three platforms. Not to mention, Xeons can now supplant Itaniums, but let's just dance around that subject thanks very much. :-)

What has happened though, is that in the lazy shorthand of business computing journalism, RISC has become equated with "large SMP machines with lots of HA features produced by vertically integrated companies like IBM, Oracle, HP and Fujitsu." It's a bit like equating V8 with "heavy car with terrible handling and fuel economy" because you happened to be writing about the American car market in the 1950s.

* a universe in which DEC managed to make VAXes actually go fast somehow

--
-Snorbert, somewhere in the antipodes

x86 compatible? by unixisc · 2011-09-19 17:47 · Score: 2

That, as well as the i860 too (which was even earlier than i960, but used in the Intel Paragon supercomputer). And this new CPU - is it x86 compatible? Or are we about to see a new instruction set?

Even aside from those, Intel had rights to the DEC Alpha once it made its settlement w/ DEC. That was still #1 in performance when Compaq/HP killed it. If this new CPU is going to be incompatible w/ x86, I don't think it has any more of a future than the Itanic, much less EM64.

HP was out of its mind to kill PA-RISC for Itanium. Compaq was out of its mind not to aggressively push Alpha in the NT market, and extend OVMS. All RISC vendors - IBM and Oracle - should learn the lessons from Itanium and not let Intel shoot down superior and/or well established RISC platforms like Power or Sparc in favor of something totally new. And does this make Itanium an HP-only CPU, dropping even the Intel backing?

Also, what exactly is closed RISC architectures today? OpenSparc is available, OpenPower is available, and even MIPS, as much as I understand it, is freely licensed that there are so many organizations using it. With 3 open RISC architectures, why does anyone need another?

Re:x86 compatible? by jwilso91 · 2011-09-20 00:58 · Score: 1

Fighting Intel is definitely fighting The Man. Back in the 90s I worked with a computer company that developed and marketed their own RISC architecture chip. Intel spent more in R&D annually than our entire company revenues and, shall we say, has a well-funded in-house legal department. Needless to say, their software now runs exclusively on Wintel.
Re:x86 compatible? by unixisc · 2011-09-20 17:27 · Score: 1

Intergraph?

EPIC RISC game over for Intel by unixisc · 2011-09-19 18:06 · Score: 1

Both Sparc and Power now have open specifications that anyone can use to implement their own microprocessor and sell it in the market for any targeted applications. Which is pretty much the goal of open standards. The closed RISC standards that were there - Clipper, PA-RISC and Alpha (Alpha actually less so) are all dead, as are i860 and i960.

Incidentally, the latter Alpha and Power architectures, as well as the MAJC processors all borrowed some VLIW concepts such as concatenating multiple instructions into a single word to enhance their SIMD capabilities, so it's not like VLIW is a complete failure. Itanium managed to, on a PR front, knock down PA-RISC, MIPS and later (after HP bought Compaq) Alpha, but ironically, failed to do much against Am64, w/ the result that it's not made a dent in the marketplace, and Microsoft, Oracle, RedHat and Canonical have all dropped support for it. Even Intel's latest C++ & Fortran compilers don't support Itanium: support is referred to earlier versions. Given that factoid, Intel's announcement reiterating support for Itanium sounds hollow. And w/ the Itanium's list price of $700-$4000, one can't support that CPU even if one wants to.

The game is over - the only CPUs that matter are x64, Power, Sparc and MIPS (I'm not counting ARM here, since it's so far unsuitable for server apps).Intel can forget about dethroning either IBM or Oracle in that arena.

Re:RISC? by unixisc · 2011-09-19 18:16 · Score: 1

EPIC is something b/w VLIW and RISC. RISC does all the dynamic analysis (branch predictions, speculative executions) in hardware, VLIW does it all in software (aided by the ultimate compiler), but EPIC is somewhere in between. In EPIC, a number of techniques are implemented @ chip level to get around the shortcomings of VLIW. Particularly, register-renaming and rotating register files are RISC features: in VLIW, a compiler would eliminate the need for register renaming

On paper, VLIW is mutually exclusive from RISC, given that the former does all possible optimizations @ compiler level, allowing (in theory at least) for the simplest possible architecture. In practice, the dynamic analysis hardware that RISC uses has been found to be only a small fraction of the chip area, thereby virtually eliminating the VLIW advantage.

CISC == variable length instructions by unixisc · 2011-09-19 18:22 · Score: 1

Variable length instructions are what force the CPU to have microcode, to determine the length of each instruction. That's what makes CISC CISC. Note that RISC doesn't exactly mean reduced #instructions: the instruction set of Power, for instance, is huge, while that of the PDP-11 was very small. What makes a CPU CISC is variable length instructions.

Hauppauge 486 + 860 by johu · 2011-09-19 18:32 · Score: 1

Don't forget Hauppauge i486 motherboard that had i860 on it. Not quite i960, but still RISC. Pretty much only thing you could do with i860 side was running sample application included on floppies that rotated some characters on upper right corner of screen - and that rotation persisted over reboot with ctrl+alt+delete. Whoo, multi-processing! I think i860 processor on that motherboard was intended to be used together with bundled non-standard display adapter for some sort of CAD use.

I actually had one of those, got it from some bankrupt company with full manuals, compiler for i860 etc. Shame I've lost it over years as I doubt there's many of those left today. Even had that custom display adapter and bunch of technical information from factory as it was some sort of pre-production sample sent to company importing Hauppauge products.

http://www.geekdot.com/index.php?page=hauppauge-4860

Re:Hauppauge 486 + 860 by Jeremy+Erwin · 2011-09-20 05:50 · Score: 1

The i860 and the i960 were entirely different chips.
Famously, the i860 was described as a Cray on a chip

Is the 64-bit mode RISC? by unixisc · 2011-09-19 18:34 · Score: 1

Talking about just the 64-bit mode, where only the instructions that deal w/ 64-bit arithmetic are involved, are all those instructions of fixed length? In other words, would microcode be needed if one were to run a program that just used 64-bit instructions?

If it is, then x64 can be called a 64-bit RISC CPU (even while being a 32-bit CISC CPU), at least the 64-bit part of it. But if the ALU instructions that deal w/ 64-bits are variable as well, then AM64 is a 64-bit CISC CPU.

So which is it?

Re:Is the 64-bit mode RISC? by badkarmadayaccount · 2011-10-01 00:36 · Score: 1

VLE does not a CISC make - check out PPC Embedded profile.

--
I know tobacco is bad for you, so I smoke weed with crack.

128 bit CPUs? by unixisc · 2011-09-19 18:49 · Score: 1

Somewhat unrelated, I have a different question from the topic of this thread.

Are there any 128-bit CPUs? By this, I specifically mean a CPU where the ALU is 128 bits, and one can do 128-bit arithmetic or logical operations? I'm not talking about CPUs w/ 128-bit FPUs either - even that is a totally different animal. I'm specifically asking about 128-bit ALUs in the integer operations part of it.

I know that the upper limit of a 64-bit CPU makes it unlikely that a 128-bit CPU would be needed for any memory limits. What I do want to know is whether any CPU would work w/ 128-bit numbers in a single instruction cycle.

Re:128 bit CPUs? by Sique · 2011-09-19 22:40 · Score: 1

GPUs are routinely 256 bit for both integer and floating point instructions. The Cell processor of Sony PS3 fame has seven 128 bit SPEs (Synergistic Processing Elements), which are controlled by a 64 bit PPE (PowerPC Processing Element).

--
.sig: Sique *sigh*

Re:CLOSED? by unixisc · 2011-09-19 19:17 · Score: 1

I couldn't find any references to any open MIPS projects, but there is a Power.org that has open the Power spec.

...and z/Architecture by BBCWatcher · 2011-09-19 20:58 · Score: 1

The IBM System z mainframe CPU is most definitely a "CPU that matters." You just have to respect 5.2 GHz clocked (continuous) cores, and mainframe growth has been huge in recent years. IDC says IBM z now has 9% of the total server hardware market, making it bigger than Sparc, MIPS, and ARM servers combined. I tend to think of IBM z as the Apple Macintosh of servers: once written off prematurely but now widely admired for its innovation/quality and (more importantly) for its rapid marketshare gains.

Actually, I wouldn't put Sparc and MIPS on that list. ARM is only just starting to get interesting (for servers).

Re:...and z/Architecture by TheRaven64 · 2011-09-19 22:33 · Score: 1

IBM has cut costs on both the POWER and System/z lines a lot in the last few years by combining the chip development. The POWER6 and z/10 are different chips, but they share a lot of the same functional units (including things like BCD). This means that the System/z hardware people only need to develop things that are specific to the large mainframes, not worry about the complete system design.

--
I am TheRaven on Soylent News

Re:CISC by ThePhilips · 2011-09-19 21:44 · Score: 1

Or more to the point: why organizations are picking RISCs at all?

Either Intel or author of RTFA is missing the point. Most organizations use RISC based systems which come as part of the business critical solutions. Hardware rarely accounts for 10% of the deal. Software licenses, deployment, testing and long term support are where the real money are.

Unless Intel introduces an architecture which it commits to support for at least one decade, I do not see a thing changing on corporate landscape. The problem with Intel boxes is that by the time you need a replacement part, the CPU/etc generation have already changed and one needs to replace the whole box. That obviously leads to the problem that you can't install the same tested old version of the OS and of the 3rd party crap - meaning that the whole solution has to be tested from ground up. It is not uncommon for such complete tests be worth more than 1000 person/days. Suddenly, replacement of a single $4000 server becomes a magnitudes more expensive affair.

P.S. But needless to mention that at least some part of the RISC stronghold was already dismantled: DB hosting for which now more and more Linux/x64 is used.

--
All hope abandon ye who enter here.

Re:Yay for abject lies by unixisc · 2011-09-19 22:57 · Score: 1

I just hope that none of the 64-bit extensions of Am64 is CISC: if that's the case, then future processors that drop 32-bit support can be pure RISC. And that time will come - how many of us today worry about whether win16 apps are supported or not?

Good news everyone! by Maury+Markowitz · 2011-09-19 23:58 · Score: 1

" days of IT organizations being forced to deploy expensive, closed RISC architectures for mission-critical applications are nearing an end"

Indeed, the days of IT organizations being forced to deploy expensive, closed, sorta-RISC is upon us! Happy days!

Bypass the older instruction set? by ResidentSourcerer · 2011-09-20 01:49 · Score: 1

So can you get better performance with Intel chips by bypassing the old crufty instruction set? If so, then just redoing the system libraries of the OS might make a major difference in overall performance.

Can a compiler be set to produce 'universal' binaries that can fall back to CISC instructions, but detect and execute faster instructions when available?

--
Third Career: Tree Farmer Second Career: Computer Geek First Career: Teacher, Outdoor Instructor, Photographer.

Re:RISC? by f8l_0e · 2011-09-20 02:06 · Score: 1

To be fair, that EV7 you had was fabricated in either 180 or 130 nanometer process. Made on 32nm process, it would be a whole different story.

hackers called it in 1995 by Vorpix · 2011-09-20 02:19 · Score: 1

KATE: RISC architecture is gonna change everything.

DADE: Yeah. RISC is good.

--
frog blast the vent core

RISC was born as RITC by epine · 2011-09-20 02:28 · Score: 1

"RISC" and "freedom" are two of the most bent out of shape words in the computer science lexicon. When RMS designed "freedom" a new API, he fired off a scripting command to his global botnet s/freedom/free_as_in_beer/gggggggggggg/! but he missed the last "g" and it's been confusion ever since.

RISC actually meant Reduced Implementation Team Computing. In practice it meant "this is very cool, but we are way behind the big boys, but maybe we can catch up through a policy of extreme simplification clothed in FUD". Hardly anyone names a sexy new technology after a budgetary constraint, so it became known as RISC instead.

There was about a ten year period where you could do a CPU design on RISC principles for much less than a CISC design, while bragging about superior performance. This was always a bit disingenuous, since CISC chips were designed for the largest (and cheapest) mass production processes, while RISC chips were produced in much smaller lots with entirely different binning triage. Was it really ever the architecture?

The dirty secret here is that by 1996 the complexity of the execution core was only a small driver in project design cost. Cache architecture, cache coherency, bus protocol were equally or more important, and everyone had an equally complex design: there's no such thing as a RITC cache hierarchy in the performance space. The Pentium Pro was the first Intel chip which really nailed the caching subsystem. You see this when benchmarks hold up really well under load. On a lightly loaded system the Pentium Pro and the Pentium Pah weren't that different. Many were disappointed. But when you started to run a heavily loaded Windows NT, you really noticed a difference.

Some of the RISC people said about the Pentium Pro split-transaction bus "that's not a real man's bus!" What they meant was "if Intel makes that bus any better, we're doomed!" They all knew their real edge had been won by hard work rather than dumb lingo, despite the mass indirection in the marketing space.

Much of the performance of Alpha had less to do with architecture and more to do with some very expensive metalization layers which made the architecture possible. Bike frames filled with pressurized helium have not yet made it to Walmart (I'm brave enough to conjecture without clicking through).

This article is doing its level best to resurrect RISC as a badge of distinction purely as a market agenda. What a crock. I'd rather click through 38 pages of Phoronix.

Someone could do one of those sarcastic motivation posters titled "RISC" over a picture of a man with elephant balls on a trolly, and the caption underneath: "This is your compiler on Itanium".

Re:Yay for abject lies by m50d · 2011-09-20 03:09 · Score: 1

Why do you want pure RISC? I'd rather have a more efficient processor than theoretical purity. Even ARM has moved away from pure RISC with Thumb.

--
I am trolling

It's all about I/O, stupid by ebunga · 2011-09-20 03:32 · Score: 1

For most server workloads, I/O is more important than raw computing horsepower. Ask anyone that has actually virtualized a few dozen machines, or really, anybody that has been in the field for more than "I JUST DROPPED OUT OF COLLEGE AFTER FAILING DATA MANAGEMENT 101 TIME TO MAKE A STARTUP CENTERED AROUND NEW IMPLEMENTATIONS OF TECHNOLOGIES EVERYONE FOUND TO BE BAD IDEAS IN THE SIXTIES SEVENTIES AND EIGHTIES."

Note: all caps because eliminating lower case and using a limited character set means the nosql database can store 30% more data in the same amount of memory.

How about software compiled as RISC microcode? by PhunkySchtuff · 2011-09-20 11:22 · Score: 1

With the P6 onwards, Intel's x86 chips have been pretty well a RISC core wrapped with a powerful fetching and decoding engine that transforms "native" x86 instructions into CPU specific microcode. This decode engine makes some pretty good assumptions about being able to reorder instructions for greater throughput and the like, but it's got me wondering - would it be possible for the CPU's low-level microcode to be exposed as an instruction set and software compiled directly to the low-level RISC-like microcode?

Would this provide any tangible benefit to execution speeds (being able to skip part of the decode process) or would it allow a compiler to make more educated decisions about instruction reordering and general program flow if it had access to generate microcode instead of x86 instructions?

Would it be possible to have fat binaries that have x86 instructions and microcode instructions in the same file (fat binaries are possible on many systems, such as OS X where you can have PPC and x86 executable code in the one binary)

--
Specialist Mac support for creative pros, Melbourne

Re:Yay for abject lies by unixisc · 2011-09-20 17:25 · Score: 1

RISC is more efficient, and the top performers in RISC like Alpha 21364 adapted some VLIW principles, like long instruction words, to enhance performance. Once win64 is well entrenched i.e. most 32-bit apps have moved to 64-bit, they could simply run on a RISC CPU, which would require a lot less circuitry to support legacy x86.

Re:LEA? by badkarmadayaccount · 2011-09-30 23:11 · Score: 1

Isn't that bitshifitng?

--
I know tobacco is bad for you, so I smoke weed with crack.

FX!32 by badkarmadayaccount · 2011-10-01 00:45 · Score: 1

I think Intel will be looking around for that Transmeta IP any day, now. And getting into reverse engineering FX!32. Maybe call up their buddy IBM for some source code of a certain bought out z/Arch emulating start-up that Apple licensed at a certain moment in time.

--
I know tobacco is bad for you, so I smoke weed with crack.

Re:LEA? by tibit · 2011-10-01 02:24 · Score: 1

Yeah, that and addition is usually bundled up in a LEA. Some architectures, like DSPs, also support modulo addressing in a LEA, I'm sure. But it's not a general-purpose multiplication operation. The AC was just confused or trolling.

--
A successful API design takes a mixture of software design and pedagogy.

169 of 225 comments (clear)