A History of PowerPC

IBM also says Screw you to intel by Erect+Horsecock · 2004-03-31 07:47 · Score: 4, Informative

IBM Also announced a ton of new PPC information and tech today at an event in new york. Opening up the ISA to third parties including Sony.

--
I hope you die painfully and alone.

Re:IBM also says Screw you to intel by Anonymous Coward · 2004-03-31 07:54 · Score: 4, Interesting

Sony?

Does this mean that ALL next-generation consoles (next Gamebuce, PS3 and Xbox2) will use a IBM chip?
Re:IBM also says Screw you to intel by bhtooefr · 2004-03-31 08:09 · Score: 4, Informative

If it's what I think it is, then Intel has been doing this since the 80386 (try VMWare, which uses your box's CPU in this way, then Bochs, which emulates an x86 CPU), Motorola (and therefore IBM, because of the AIM alliance) has been doing this since the PPC 601 (Mac-on-Linux only runs on PPCs, pretty damn obvious here, isn't it?), and it just goes on and on.

Big Endian by nycsubway · 2004-03-31 07:54 · Score: 4, Funny

I'm not a fan of big endian... or is it little endian... I dont remember, but I do know, if it's backwards, it's backwards because it's reverse of what I'm used to.

--
http://github.com/gbook/nidb

Re:Big Endian by Mattintosh · 2004-03-31 08:00 · Score: 5, Informative

PPC is big endian, which is normal.

X86 is little endian, which is chunked-up and backwards.

Example:
View the stored number 0x12345678.

Big endian: 12 34 56 78
Little endian: 78 56 34 12

Clear as mud?
Re:Big Endian by Anonymous Coward · 2004-03-31 08:20 · Score: 5, Informative

Big-endian appeals to people because they learned to do their base-10 arithmetic in big-endian fashion. The most significant digit is the first one encountered. It's habit.

Little-endian has some nice hardware properties, because it isn't necessary to change the address due to the size of the operand.

Big Endian:
uint32 src = 0x00001234; // at address 1000, say
uint32 dst1 = src; // fetch from 1000 to get 00001234
uint16 dst2 = src; // fetch from 1000 + 2 to get 1234

Little Endian:
uint32 src = 0x00001234; // at address 1000, say
uint32 dst1 = src; // fetch from 1000
uint16 dst2 = src; // fetch from 1000

The processor doesn't have to modify register values and funk around with shifting the data bus to perform different read and write sizes with a little-endian design. Expanding the data to 64 bits has no effect on existing code, whereas the big-endian case will have to change all the pointer values.

To me, this seems less "chunked up" than big endian storage, where you have to jump back and forth to pick out pieces.

In any event, it seems unnecessary to use prejudicial language like "normal" and "chunked up". It's just another way of writing digits in an integer. Any competent programmer should be able to deal with both representations with equal facility.

Being unable to deal with little-endian representation is like being unable to read hexadecimal and insisting all numbers be in base-10 only. (Dotted-decimal IP numbers, anyone?)

Big-endian has one big practical advantage other than casual programmer convenience. Many major network protocols (TCP/IP, Ethernet) define the network byte order as big-endian.
Re:Big Endian by karlm · 2004-03-31 12:04 · Score: 4, Informative

What kind of strange CPU implementation modifies register values when addressing sub-word vlaues? This is done most commonly by the programmer at write-time, (or maybe by some strange compiler or assembler at compile-time). This is not a hardware advantage in any architecture I'm aware of. Are you perhaps talking about extra hardware burden associated with unaligned memory access? Unaligned memory access is not a consequence of byte ordering.
One more big advantage of the big-endian byte order is that 64-bit big-endian CPUs can do string comparisons 8 bytes at a time. This is a big advantage where the length of the strings is known (Java strings, Pascal strings, burrows-wheeler transform for data compression) and still an advantage for null-terminated strings.
I'm not aware of any such performance advantages for the little-endian byte order.
The main advantage of little-endian byte order is ease of modifying code written in assembly or raw opcodes if you later decide to change your design and go with larger or smaller data fields. The main uses for assembly programming are very low-level kernel programming (generally the most stable part of the kernel code base) and performace enhancement of small snippets of code that have been well tested and profiled and are unlikely to change a lot.
I agree that an decent programmer should be able to deal with either endianess, but the advantages of the little-endian byte order seem to be becoming less and less relevant.

--
Copyright Violation:"theft, piracy"::Anti-Trust Violation:"thermonuclear price terrorism"<-Overly dramatic language.

Guide to the PowerPC architecture by Anonymous Coward · 2004-03-31 07:56 · Score: 5, Informative

They also have a very good article about the PowerPC's three instruction levels and how to use implementation-specific deviations, while code stays compatible. This introduction to the PowerPC application-level programming model will give you an overview of the instruction set, important registers, and other details necessary for developing reliable, high performing PowerPC applications and maintaining code compatibility among processors.

Nice 42 year backward compatibility by JohnGrahamCumming · 2004-03-31 07:58 · Score: 4, Insightful

From TFA:

"Today's IBM mainframes still maintain backwards-compatibility with that revolutionary 1962 instruction set."

Good plan then, Intel, on that whole Itanium mess.

John.

Interesting quote from the article by alispguru · 2004-03-31 07:58 · Score: 4, Funny

Buried in the middle of a section talking about CMOS, we find this:

Thus, in the days when computing was still so primitive that people thought that digital watches were a neat idea, it was CMOS chips that powered them.

You find Douglas Adams fans all over, don't you?

--

To a Lisp hacker, XML is S-expressions in drag.

Obligatory Quote of the Day by crumbz · 2004-03-31 07:58 · Score: 5, Funny

"Finally, the Fishkill operation is so hip that the server room runs exclusively on Linux."

I didn't think it was possible to use the words "Fishkill" and "hip" in the same sentence with a straight face.

Yeah, I remember by Anonymous Coward · 2004-03-31 08:01 · Score: 4, Interesting

back in 94 or so, when the AIM were predicting that they were going to completely obliterate the x86 in a few years. Anyone still have those neat graphs that showed exactly where Intel would hopelessly fall behind while PPC would accellerate exponentially into the atmosphere?

Re:Yeah, I remember by Billly+Gates · 2004-03-31 08:44 · Score: 4, Informative

Yes

What Intel did was include RISC architecture in around the x86 instruction set to create the pentium pro, pentium II, III, etc. Otherwise they would have been killed.

Infact IBM was correct. Cisc was dying. THe pentium1 could not compete agaisnt the powerpc unless it had a very high clock speed. All chips today are either pure risc or a hybrid cisc/risc like todays Althons/Pentium's. The exception is the nasty Itanium which is not doing too well

--
http://saveie6.com/

Nice PowerPC Roadmap by bcolflesh · 2004-03-31 08:01 · Score: 4, Informative

Motorola has a nice overview graphic - you can also checkout a more generalized article at The Star Online.

One of the coolest things about PowerPC chips by Anonymous Coward · 2004-03-31 08:05 · Score: 5, Funny

Is its revolutionary three level cache architecture, utilising a 3-way 7 set-transitive cache structure, which gives performance equivalent to a 2-level traditional x86 style cache for more content addressable memory. Each processor has a direct triple-beat burstless fly-by cache gate interface capable of fourteen sequential memory write cycles, including read/write-back and speculative write-thru on both the instruction and data caches. Instruction post-fetch, get-post, roll-forward and cipher3 registers further enhance instruction cache design, and integrated bus snooping guarantees cache coherency on all power PC devices with software intervention. Special cache control and instructions were necessary to control this revolutionary design, such as 'sync', which flushes the cache, and the ever-popular 'exeio' memory fence-case instruction, named after the line in the popular nursery rhyme.

Computer history IS IBM-centric by Random+BedHead+Ed · 2004-03-31 08:07 · Score: 4, Insightful

I don't see how computer history that goes back to the 1960s can fail to be "IBM-centric." Remember, these were the big guys Microsoft was afraid of pissing off in the 1970s and 1980s. No one ever got fired for buying IBM, because they pretty much wrote the book on chip design before Intel hit it big.

Re:"Chips May Physically Reconfigure Themselves" by Snocone · 2004-03-31 08:10 · Score: 4, Informative

P.S. Does anyone know why Windows has never been adapted to run under PPC?

Errm, actually, it WAS. See for instance

http://home1.gte.net/res008nh/nt/ppc/default.htm

Re:For those who want PPC970 without getting a Mac by Lord+Kano · 2004-03-31 08:14 · Score: 4, Funny

Geez, I can't believe I'm saying this, but it would be cheaper to just buy a Mac.

LK

--
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano

200 instructions at once? by Phs2501 · 2004-03-31 08:14 · Score: 4, Insightful

I think it's quite imprecise writing for the article to state (several times, for POWER4 and the PowerPC 970) that they "can process 200 instructions at once at speeds of up to 2 GHz." That makes it sound like they can finish 200 instructions at once, which is silly. I imagine what they really mean is that there can be up to 200 instructions in flight in the pipeline at a time.

(Which is great until you mispredict a branch, of course. :-)

Re:200 instructions at once? by Abcd1234 · 2004-03-31 09:05 · Score: 4, Informative

Yeah. It's a good thing that the processors in the POWER line has unbelievable branch prediction logic. So, for example, the branch prediction rate for the POWER 4 is in the mid to high 90 percentile for most workloads (as high as 98%, IIRC) In fact, quite a large number of transitors are dedicated to this very topic, which allows the processor to do a pretty good job of achieving something close to it's theoretical IPC.

Although, it should be noted that the pipeline depth for the POWER4 is just 15 stages (as opposed to the P4 which has, IIRC, 28 stages), so while a branch misprediction is quite bad, it's not as bad as some architectures. My understanding is that, in order to achieve that 200 IPC number, the POWER4 is just a very wide superscalar architecture, so it simply reorders and executes a lot of instructions at once. Plus, that number may in fact be 200 micro-ops per second, as opposed to real "instructions" (although, that's just speculation on my part... it's been quite a while since I read up on the POWER4), as the POWER4 has what they term a "cracking" stage, similar to most Intel processors, where the opcodes are broken down into smaller micro-ops for execution.

Well sort of by Erect+Horsecock · 2004-03-31 08:16 · Score: 4, Informative

It's actually closer to Intel's Vanderpool technology that allows you to partition the cpu through firmware.

Example: Windows is running on slice 1, BSD on slice 2, and Linux on slice 3.

BSD gets a kernel panic and crashes, the slice is restarted without affecting the remaining running OS's. It's, for the lack of a better term, Hyperthreading for the whole computer.

--
I hope you die painfully and alone.

Re:For those who want PPC970 without getting a Mac by Morologous · 2004-03-31 08:17 · Score: 5, Informative

Or, you could always settle for an RS/6000.

RS/6000

Or, a Power-based IBM workstation,

Workstation

Re:So what HDL do they use? by sam_van · 2004-03-31 08:20 · Score: 4, Informative

When I was working on the embedded IBM PowerPCs (400 series), we used Verilog primarily...though there were a few VHDL hold-outs.

--
Thinking of starting a business in Minnesota? Me too! mnsmall.biz

Sounds fishy to me... by geoswan · 2004-03-31 08:33 · Score: 4, Interesting

...Even x86 chip manufacturers, which continued for quite a time to produce CISC chips, have based their 5th- and 6th-generation chips on RISC architectures and translate x86 opcodes into RISC operations to make them backwards-compatible...

Maybe this is a sign that it has been too long since I learned about computer architecture, but is it really fair to call a CPU that has a deep pipeline, a crypto-RISC CPU?

When my buddy first told me about this exciting new RISC idea one of the design goals was each instruction was to take a single instruction cycle to execute. Isn't this completely contrary to a deep pipeline? The Pentium 4 has a 20-stage pipeline IIRC.

Was I wrong to laugh when I heard hardware manufacturers claim, "sure, we make a CISC, but it has RISC-like elements .

What I am reminded of is the change in how musicians are classified. When I grew up rock music was just about all that young people listened to. Rap and punk music had never been heard of. And country music was considered incredibly uncool. Now country music's coolness factor has grown considerably. And a strange thing has happened. Lots of artists who were unquestionably considered in the Rock camp back then, like Neil Young, or Credence Clearwater, are now classified as Country music, as if they had never been anything else.

It has been a long time, but I remember learning in my computer architecture course about wide microcode instruction words, and narrow microcode instruction words. Wide microcode instruction words allowed the CPU to do more operations in parallel. Ie. the opposite of a RISC. So, I ask in perfect ignorance -- how wide are the Pentium 4 and Athlon microcode?

If I am not mistaken the Transmeta was a very wide instruction word. And if I am not mistaken, doesn't that make it the opposite of a RISC?

Re:Sounds fishy to me... by Zathrus · 2004-03-31 09:28 · Score: 4, Informative

When my buddy first told me about this exciting new RISC idea one of the design goals was each instruction was to take a single instruction cycle to execute. Isn't this completely contrary to a deep pipeline?

No, in fact pipelining is central to the entire concept of RISC.

In traditional CISC there was no pipelining and operations could take anywhere from 2-n cycles to complete -- at the very least you would have to fetch the instruction (1 cycle) and decode the instruction (1 cycle; no, you can't decode it at the same time you fetch it -- you must wait 1 cycle for the address lines to settle, otherwise you cannot be sure of what you're actually reading). If it's a NOOP, there's no operation, but otherwise it takes 1+ cycles to actually execute -- not all operators ran in the same amount of time. If it needs data then you'd need to decode the address (1 cycle) and fetch (1 cycle -- if you're lucky). Given that some operators took multiple operands you can rinse and repeat the decode/fetch several times. Oh, and don't forget about the decode/store for the result. So, add all that up and you could expect an average instruction to run in no less than 7-9 cycles (fetch, decode, fetch, decode, execute, decode, store). And that's all presuming that you have a memory architecture that can actually produce instructions or data in a single clock cycle.

In RISC you pipeline all of that stuff and reduce the complexity of the instructions so that (optimally) you are executing 1 instruction/cycle as long as the pipelines are full. You have separate modules doing the decodes, fetches, stores, etc. (and in deep-pipeline architectures, like the P4, these steps are broken up even more). This lets you pump the hell out of the clockrate since there's less for each stage of the pipeline to actually do.

Modern CPUs have multiple everything -- multiple decoders, fetchers, execution units, etc. so it's actually possible to execute >1 cycle/cycle. Of course, the danger to the pipelining is that if you branch (like when a loop runs out or an if-then-else case) then all those instructions you've been decoding go out the window and you have to start all over from wherever the program is now executing (this is called a pipeline stall and is very costly; once you consider the memory delays it can cost hundreds of cycles). Branch prediction is used to try and mitigate this risk -- generally by executing both branches at the same time and only keeping the one that turns out to be valid.

Was I wrong to laugh when I heard hardware manufacturers claim, "sure, we make a CISC, but it has RISC-like elements .

Yes, because neither one exists anymore. CISC absorbed useful bits from RISC (like cache and pipelining) and RISC realized there was more to life than ADD/MUL/SHIFT/ROTATE (oversimplification of course). The PowerPC is allegedly a RISC chip, but go check on how many operators it actually has. And note that not all of them execute in one cycle. x86 is allegedly CISC, but, well... read on.

how wide are the Pentium 4 and Athlon microcode?

The x86 ISA has varying width. It's one of the many black marks against it. Of course, in reality, the word "microcode" isn't really applicable to most CPUs nowadays -- at least not for commonly used instructions. And to further muddy the picture both AMD and Intel don't actually execute x86 ISA. Instead there's a translation layer that converts x86 into a much more RISC-y internal ISA that's conducive to running at more than a few megahertz. AFAIK, the internal language is highly guarded by both companies.

If I am not mistaken the Transmeta was a very wide instruction word. And if I am not mistaken, doesn't that make it the opposite of a RISC?

Transmeta and Intel's Itanium use VLIW (very large instruction word) computing, which is supposed to make the hardware capable of executing multiple dependant or independant operations in one cycle. It does so by putting the onus on the compiler

I like this quote by Zo0ok · 2004-03-31 08:35 · Score: 4, Insightful

The 64-bit PowerPC 970, a single-core version of the POWER4, can process 200 instructions at once at speeds of up to 2 GHz and beyond -- all while consuming just tens of watts of power. Its low power consumption makes it a favorite with notebooks and other portable applications on the one hand, and with large server and storage farms on the other.

Can anyone tell me where I can buy a G5 laptop?

Re:Motorola by Gizzmonic · 2004-03-31 08:35 · Score: 4, Interesting

I've seen this myth repeated again and again, usually in conjunction with conspiracy theories like "Motorola quit developing the G4 to hurt Apple".

1) 80% of all G4s sold have gone to Apple. So targetting the larger embedded market is a marketing excuse, a failure, or both.

2)Motorola's fabrication facilities have been in horrendous shape for at least 4 years. High failure rates, In one location, they even quit running the fans to "save energy."

3)Motorola has failed to advance in the embedded world as well. TiVO and many others are switching from PPC to MIPS because Motorola's stuff is not moving forward.

4)Brain-drain and 'Dilbert syndrome' have plagued Motorola's CPU division since Apple killed the clones in 1997. They are spinning off that part of their business, but there's no indication that the situation has improved.

--
(-1, Raw and Uncut is the only way to read)

Don't ignore integer sizes! by Dog135 · 2004-03-31 08:59 · Score: 5, Insightful

Expanding the data to 64 bits has no effect on existing code, whereas the big-endian case will have to change all the pointer values

So, you're reading in an array of integers, which are now 64 bit vs 32 bit and no code change is needed?

Programs NEED to know the size of the data they're working with. Simply pulling data from an address without caring for it's size is a recipee for disaster!

--
"That's so plausible, I can't believe it!" - Leela

28 of 193 comments (clear)