Slashdot Mirror


Linux On Another New Architecture: PowerPC 64-bit

An unnamed correspondent writes: "This one rather silently whizzed by on the kernel mailing list. IBM reports that they have ported Linux to PowerPC hardware running in 64-bit mode. This no doubt applies only to the larger processors but it's pretty cool all the same." I don't see this processor yet listed on the NetBSD page, even on the mind-bending list of not-yet-integrated ports; is this a first? :)

3 of 131 comments (clear)

  1. Re:OK, dumb 32/64 bit question by Trepalium · · Score: 5

    eh? The 'bittiness' of the CPU rarely has anything do with floating point capabilities. The Intel x86 line all have the ability to use 80-bit floating point numbers (10 bytes). In fact, it was because of this the [in]famous FPU memory move was created for the Pentium processors -- it was faster to move memory into the FPU registers and then out back to memory than it was to use the usual movsd instructions to do the same, because via the FPU you moved 8 bytes (64 bits) at a time, whereas with movsd, you were only moving 4 bytes at a time. On the Pentium Pro and Pentium II, they finally fixed this by the use of write combining so that movsd'ing a block of memory was as fast or faster than doing it via the FPU. The numbers of bits generally refers to one of two features of the CPU -- either it's bus, or the size of the general purpose registers and address space. The Intel Pentium for example, had a 64-bit bus, but still only 32-bit registers and memory space. The Intel 80386SX had a 16-bit bus, and 32-bit registers.

    --
    I used up all my sick days, so I'm calling in dead.
  2. Execution units rapidly reach diminishing returns. by Christopher+Thomas · · Score: 5

    "Unlike a typical PC microprocessor, the chip features eight execution units fed by a 6.4 gigabyte-per-second memory subsystem, allowing the POWER3 to outperform competitors' processors running at two to three times the clock speed"

    Eight execution units! I recall that the x86 line have half of that. And 6.4Gb/s memory is not to be laughed at either!


    Memory bandwidth is a good thing. Low latency cache hits are great thing, if you can get them (no idea if PPC does this or not).

    However, adding more execution units won't buy you much beyond a fairly small number. The reason: you just don't have that much extractable parallelism in the serial instruction stream.

    I had the good fortune to be playing with this recently via simulation. If you give the processor a *huge* instruction window (256 instructions) and the ability to execute *any* number of instructions of *any* type in parallel (except for memory accesses - see below), you still get an average Instructions Per Clock of about 2.1-2.2. 95% of the time, you're getting four instructions or fewer issued (and most of the time, far fewer than that).

    When SMT is put in silicon, wider issue will become practical (due to increased parallelism in the instruction stream), but as it is, you're better off spending the silicon on other improvements.

    Re. memory accesses; the reason why it's extremely difficult to do memory accesses out-of-order with each other is that you have to check to see if any given two memory accesses refer to the same location (indicating a dependence). You often don't know what the target address is until late in the pipeline, and you'll still need to do a TLB translation to get the physical address, and compare two large bit vectors (the addresses).

    Remember, to be useful for scheduling, you have to be able to do all of this very quickly and very early in the pipeline.

    All of this makes out-of-order memory accesses very difficult to implement theoretically, and a nightmare to implement in real silicon. It's still sometimes done in a limited manner, but this doesn't affect the IPC very much.

  3. Power3 (and Power4) have som really cool features. by zensonic · · Score: 5

    As in almost any area where theres money to be earned Big Blue is in there with some really cool hardware.

    Taken from:

    http://www.rs6000.ibm.com/resource/pressreleases/1 998/Oct/power3.html:

    "Unlike a typical PC microprocessor, the chip features eight execution units fed by a 6.4 gigabyte-per-second memory subsystem, allowing the POWER3 to outperform competitors' processors running at two to three times the clock speed"

    Eight execution units! I recall that the x86 line have half of that. And 6.4Gb/s memory is not to be laughed at either!

    --
    Thomas S. Iversen