Slashdot Mirror


Revolutionizing x86 CPU Performance

NickSD writes "ChipGeek has an interesting article on increasing x86 CPU performance without having to redesign or throw out the x86 instruction set. Check it out at geek.com."

6 of 296 comments (clear)

  1. RISC by e8johan · · Score: 5, Interesting

    Ok, he realizes that the x86 architecture is flawed. One of the most limiting problems is the lack of general purpose registers (GPR), so he adds more complexity to an allready over-complex solution to solve this problem. All I have to say to this is: when will you see that the solution is as simple as switching architecture!

    As most code today is written in higher level languages (C/C++, Java, etc.) all it takes is a recompile and perhaps some patching and adaptations to small peculiarities. The Linux kernel is a proof of this concept, a highly complex piece of code portable to several platforms with a huge part of the code folly portable and shareable. This means that it is not hard to change architecture!

    If the main competition and its money would move from the x86 to a RISC architecure (why not Alpha, MIPS, SPARC or PPC) I'm sure that the gap in performance per penny would go away pretty soon. RISCs have several advantages, but the biggest (IMHO) is the simplicity: no akward rules (non-GP registers), no special case instructions, easy to pipeline, easy to understand and easy to optimize code for (since the instruction set is smaller).

    And to return to the original article. Please do not introduce more complexity. What we need is simple, beautiful designs, those are the ones that one can make go *really* fast.

    1. Re:RISC by Zathrus · · Score: 5, Interesting

      Ok, when you get to the Real World, let us know.

      Switching architectures is not that trivial. You seem to think that every company has the source code available for every piece of software they run. That isn't true. You seem to think that programs can easily be compiled between programs if written in C/C++ - also untrue. You think that the bug fixes for compiling between platforms are "small peculiarities" -- well, they may be small, but that doesn't make them easy. In fact, it makes it fucking hard because the differences are so buried in libraries, case-specific, and undocumented that it's a nightmare to find them. Yes, I've done this kind of thing. It's godawful.

      Changing architecture is difficult. This is not a closed vendor market - anyone can put together an x86 box and you have at least 3 different CPU vendors to chose from, 3 - 5 motherboard chipsets, and a virtually infinite variety of other hardware. If Dell computer suddenly decides to move to a PPC architecture what's going to happen? They're going to lose all their customers and fast. Because the very limited benefits of a different architecture do not make up for the costs of going to one.

      Yes, I said limited benefits. Yeah, when I was in college taking CompE, EE, and CS courses on CPU and system design I also found the x86 ISA to be the most demonic thing this side of Hell. Well, I'm older and wiser now and while x86 isn't perfect, it's not that bad either. It's price/performance ratio is utterly insane and getting better yearly. Contrary to the RISC architecture doom and gloomers, x86 didn't die under it's own backwards compatibility problems. It's actually grown far more than anyone expected and is now eating those same manufacturers for lunch.

      You know, back in the early 90s when RISC was first starting to make noise the jibe was that Intel's x86 architecture was so bad because it couldn't ramp up in clock speeds. Intel was sitting at 66 MHz for their fastest chip while MIPS, Sparc, etc. were doing 300 MHz. Of course, now Intel has the MHz crown, with demonstrations exceeding 4 GHz, and the RISC crowd is saying that MHz isn't everything and they do more work/cycle than Intel (which is true, but the point remains).

      All that said, go look at the SPEC CInt2000 and FP2000 results. Would you care to state what system has the highest integer performance? And whose chip has the highest floating point?

      Oh, and let's not forget that I can buy roughly 50 server-class x86 systems for the price of one mid-level Sun/IBM/HP/etc. server.

      Note - server performance isn't all about CPU, but since the OP wanted to make that argument, I just thought I'd point out how wrong he is. There is still quite a bit of need for high end servers with improved bus and memory architectures, but don't even try to argue that the CPU is more powerful. It isn't.

  2. The Problems of Obsolete design by Alien54 · · Score: 5, Interesting
    This is what I call the big problem. That design is utterly abominable. We live in a world where it's nothing to have 1 gigabyte of RAM in a computer. We have 80 GB hard drive platters now, allowing even greater-sized drives. And yet at the heart of every single one of your x86 computers out there, a mere 6 GP registers are doing nearly all of the processing. It's amazing. And it's something I've personally wrestled with every day of my assembly programming career.

    This sort of reminds me of what happened with IRQs. Ultimately Intel "solved this" via the PCI bus, but performace has occasionally been problematic. Of course, that problem goes back to the original IBM design for original IBM PC. Intel is also very aware, I imagine, of what happened when IBM tried a total redesign woth the EISA bus, etc. It got rejected, I think, primarily because it was propriatary. In any case, enough companies have been nailed on backward compatibility issues that Intel may be nervous about making a total break.

    The upside is being able to run old software on new hardware. You don't want to break too many things.

    --
    "It is a greater offense to steal men's labor, than their clothes"
  3. I suspect this would be a rather expensive chip by shimmin · · Score: 5, Interesting
    While the base idea is interesting (add instructions that support using the multimedia registers as GP registers), I suspect that actually implementing the functionality of the GP registers in the multimedia ones could result in a prohibitively expensive CPU.

    Anyone who's ever tried to use the MMX or XMMX registers for non-multimedia applications knows what I'm talking about. The instruction sets for them are nicely tweaked to let you do "sloppy" parallel operations on large blocks of data, and not really suited for general computing. You can't move data into them the way you would like to. You can't perform the operations you would like to. You can't extract data from them the way you would like you. They were meant to be good at one thing, and they are.

    I once tried to use the multimedia registers to speed up my implementation of a cryptographic hash function whose evaluation required more intermediate data than could nicely fit in GP registers, and had enough parallelism that I thought it might benefit from the multimedia instructions. No such luck. The effort involved in packing and unpacking the multimedia registers undid any gains in actually performing the computation faster -- and the computation itself wasn't that much faster. I was using an Athlon at the time, and AMD has so optimized the function of the GP registers and ALU that most common GP operations execute in a single clock if they don't have to access memory, while all the multimedia instructions (including the multiple move instructions to load the registers) require at least 3 clocks apiece.

    Now this leads me to suspect that the multimedia registers have limited functionality and slow response for a single reason: economics. The lack of instructions useful for non-multimedia applications could be explained via history, but what chip manufacturer wouldn't want to boast of the superior speed of their multimedia instructions? And yet they remain slower than the GP part of the chip.

    So I conclude that merely making a faster MMX X/MMX processor is prohibitively expensive in today's market. And this proposal would definitely require that, even if actually adding the additional wiring to support the GP instructions for these registers was feasible. Because what would be the point of using these registers for GP instructions if they executed them slower than the same instructions actually executed on GP registers?

  4. Re:Why? by DustMagnet · · Score: 5, Interesting
    I read the article. From what I can see this guy writes lots of assembly, but knows very little about how processors are designed. The huge gains you all see have already been made by register renaming and caches. There might be some gain left by giving the compiler direct control over these, but at the cost of much complexity in the register renaming hardware. The P4 has a very deep pipeline. Looking for register conflicts is hard enough without adding another layer of redirection.

    The fact that the article never mentions register renaming shows the author never did any research into this topic before writing.

    --
    'SBEMAIL!' is better than a goat!!
  5. Re:Why? by p3d0 · · Score: 5, Interesting
    Have you ever looked at the function entry and exit for for processors like MIPS or PowerPC? There can easily be 20-40 instructions (at 4 bytes per instruction) to save and restore registers. Sometimes fewer registers is a win.
    Ridiculous. You're saying that architectures with lots of regs are inferior because they make you save lots of registers at certain times, but reg-starved architectures make you save them all the time, all over the place, in any code that feels the slightest register pressure.

    At best, the problem you describe indicates those architectures use too many callee-save registers in their calling conventions. Having more caller-save registers are a pure win from this perspective.

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....