What Makes Apple's Power Mac G5 Processor So Hot
An anonymous reader writes "58 million transistors can drive a lot of power. Apparently, Apple appreciated the choices IBM processor architects made when designing the 970 family. This article provides the 64-bit architecture big picture for the 970 family (A.K.A. the Power Mac G5) and the critical issues in IBM's 64-bit POWER designs, covering 32-bit compatibility, power management, and processor bus design."
No kidding. Recently we moved two of our computers from our house office into another real office. The average temperature of the upstairs floor dropped 3 degrees F. It was amazing how much heat those were generating from the CPUs and hard drives.
Most importantly, it can address up 2^64 bytes of memory. And yes, that generally implies 64-bit integer GPRS. BTW, vector operations on x86 (MMX) also operate with 64 bit registers, but it can only access 4G (32G if you use the extended bits "hack").
The Raven
A linked list is just a container, so yes, if you're using your linked list - pointers will eat a bunch of space if you're storing small bits of data.
But then again, you don't have to use linked lists. The C++ standard template library has all sorts of wonderful containers that may be better suited than simple linked-lists. Java has some neat containers as well.
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
It wouldn't be safe to assume a pointer can be stored in 32-bits in 64-bit mode. The OS can map your data anywhere in the 64-bit space [which maps physically to a 40-bit bus, at least on my AMD64]. It's also not portable to do that. Even if your OS maps stuff within 4GB [e.g. top 32-bits are zero] another 64-bit box [sparc, ppc, etc..] may not [and likely would not] do that.
If you build in 32-bit mode [e.g. -m32] you lose the major benefits of the 64 which is namely the extra registers.
Tom
Someday, I'll have a real sig.
Apart from the 2.6GHz FX55, but that is a top of the line 130nm part, the best of a process generation. And some people are overclocking these on air to 2.8, 2.9GHz. I expect that the FX55 uses around 100W though, because it isn't a 90nm part and it is the fastest
AMD's 90nm 3500+ uses under 67W of power however at 2.2GHz.
More importantly, it doesn't have to break the 64 bit operations into many successive 32 bit operations. 64 bit operation are not simply 2x32 bit operations, but can be several dozen operations.
An 8-bit microcontroller can perform 64 bit floating point operations correctly. It just takes a long time.
If you had read the sidebar you'd see the article defined the issues with base-2 and base-10 number names, and introduced the prefixes "mibi-" and "gibi", which should be familiar to /.ers.
When they say 18 exabytes, they're talking base-10, otherwise they would have used the "gibi-" equivalent (exibytes?)
- sig? who is this sig of which you speak?
Sun machines with UltraSPARC processors do this too. They run 64 bit kernels, and applications are 32 bits. Unless you actually need 64 bits, in which case you feed the compiler some differnet options and it makes a 64 bit executable for you.
Both Solaris and Linux do it the same way. When you build a kernel for Linux on an UltraSPARC machine the part about kernel support for different kinds of executables offers you (among other options) 32 bit ELF (which you need), 64 bit ELF (optional), and Solaris emulation (never tried it...).
...laura
It's hard to make a comparison because for some reason IBM/Apple doesn't want to release official measurements for power usage. Which is strange because they should do really well in that measurement compared to AMD and Intel. Here's their official numbers:
2.4 GHz A64- 89 W
3.4 GHz P4(Northwood)- 89 W
3.4 GHz P4(Prescott)- 103 W
Best guess on the 2.5 GHz G5 is around 65 W.
IA64 (Itanium) has 64 bit instructions, *but* each can hold up to 3 opcodes.
Does it hurt to hear them lying? Was this the only world you had?
This page makes a fairly convincing argument that 256 bit CPUs should be enough (basically, there would be no way to exhaust the amount of memory a 256 bit CPU could access, because the number of memory locations is about the same as the number of atoms in the universe).
--Mark
"It is nice to know that the computer understands the problem. But I would like to understand it too." --Eugene Wigner
In the case of PowerPC (and SPARC, MIPS, anything else I can think of), the opcodes is still 32 bits. The number of registers stays the same, the number of instructions stays (more or less) the same. There is no reason to extend the size of opcodes.
I rarely criticize things I don't care about.
You live in the Kalahari desert?
... 14 hours a day :-)
He might live in Tucson or Phoenix, Arizona; the Sonora-Arizona desert is quite a hot place in summer. Around Hermosillo, Sonora, there's a permanent high-pressure system which pushes off any clouds coming our way, so there's uninterrupted sunlight some
All through July and August, the normal, everyday noon-time temperature is over 45 C, 47 C is not really all that surprising. Once I was out in the street at 50 C.
Being a desert, it cools down quickly at nighttime; but when there's no breeze to take away the heat, it sucks evilly to be at 10 PM and still at thirty-something C. So, being at 43 at night is awful. I know how you feel though.
The best part is the faces others make when they find out about summer in Hermosillo: "But, that's impossible!" Naa, just a bit warm.
-gus
Actually, since most modern CPUs are x86 variants, the floating point registers are usually 80 bits wide (and have been since the 1981 introduction of the 8087).
As far as "complex mathematical calculations" go, 64-bit integers aren't really that big a deal. It's pretty rare to need integers bigger than 2^32 but no bigger than 2^64; floating point usually handles big numbers more flexibly.
The big deal with 64-bit CPUs is 64-bit address pointers and operations on them (which usually aren't more complex than adding and shifting).
..and when you have 12 bytes of data and a 4..8 byte pointer?
(Of course efficient linked lists don't malloc every node, nullifying both your point and my counterpoint here)
Unless "complex mathematical calculations" means exact bignum arithmetic where 64x64->128 multiply is likely to provide 3-4 times speedup compared to 32x32->64..
Introduction to 64-bit computing
/wishes he had exa-bytes of memory right now... VS.NET on WinXP is a PIG!
There's an informative link at the bottom of the article for those requiring a bit more insight into the effect of 64-bit computing.
-- All views expressed in this post are mine and do not
-- reflect those of my employer or their clients
Sorry, but you are completely wrong here.
A lot of 32 bit processors already have a 64 bit data bus right now. This is for example the case for the Intel 32 bit processors. So loading an entity bigger than 32 bit is not an issue. The bus width between the processor core and the L1 cache is even wider allowing even bigger chunks to be loaded in one cycle.
As for the floating point unit, it is also designed to do the operations in double precision (64bit) or even more. Once again, full 64 bit processors have no advantage here.
So for floating point numbers, having a full 64 bit processor compared to current 32 bit processors does not give you any speed advantage. The only advantage that remaines is the bigger address range.
Marcel
is there another reason to use liquid cooling, other than excessive temperature?
Not sure if this is the reason, but its not just a matter of amount of heat generated. Its the amount of heat generated in a given area. So if you generate the same, or even less, heat in a smaller area, you may need to resort to something more efficient that air cooling to do the job.
A lot of 32 bit processors already have a 64 bit data bus right now. This is for example the case for the Intel 32 bit processors. So loading an entity bigger than 32 bit is not an issue.
This is not relevant to the instruction stream - you still need two load instructions. Actual bus widths are not visible to executing code - it's simply there to improve bandwidth.
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Apple acknowledged this when the released the G5 - Panter is a 32bit OS - meaning the G5 runs in 32bit mode under Panter. Tiger, on the other hand, will be a full 64bit OS and is slated to be released in March 2005 (and Apple seems to be pretty good at making OS release dates).
I have mod points, but I figured I'd answer your question instead.
When you do a die shrink, you can lower the power required at particular clock rate, or you can run at a higher clock speed with the same power dissipated. So when IBM went from 130nm for the 970 to 90nm for the 970FX, the top clock speed went up from 2GHz to 2.5GHz. Other than the process change, I believe there were very few changes to the chip.
Now, when you go from 130 nm to 90nm, the linear dimension across the chip is ~70% of what is was, and the area of the chip is (70%)^2 or about 50% of the previous chip.
Lets use some numbers, these may not be 100% accurate, but they'll explain the basic concept. The 2GHz 970 had a die size of about 121mm^2 and put out a maximum of 42W. That is about 350mW/mm^2. If we assume that the 2.5GHz 970FX has that same power consumption, but has a die size of 60mm^2, then the 970FX will produce 700mW/mm^2. So you have the same amount of power, but you are trying to suck it out of a smaller piece of silicon. So you need much more efficient cooling to keep the chip temperature the same. Hence, the liquid cooling system in the dual 2.5GHz G5.
--
The internet is the greatest source of biased information in the history of mankind.
Dude, that's insane. You'd have to build a ram stick with more mass than the entire universe to exhaust the memory addressing capability of a 256 bit chip.
All i can say is "Whoa".
You may forget that a bitblit, or a bulk memory copy operation can be accomplished in half of the time using the same number of 64 bit registers as 32 bit registers. How do you think common operations like scaling and color transformation will be affected by the increased register size and memory IO path? In my experience (Ultrasparc real world apps like GIMP and OpenSSL) most bulk integer compute operations complete in 10-20% less time when run in 64 bit mode vs 32 bit mode on the same computer (probably potentiated by L1-L2 cache performance differences in each mode), and they consistently consume about 1/2 the userland CPU cycles during that time. The biggest payout in 64 bit computing I have found is using OpenSSL with the 64 bit assembly code for encryption routines and having the 'bn' (big number) math library in 64 bit mode: I could scp database dumps across the network at full speed without dipping into enough cpu cycles to affect normal operation.
--- Nothing clever here: move along now...
You're right that P4 has a lot of rename registers, but those aren't directly accessible. Thus, their usefulness is limited by the CPU's look-ahead. In theory, a compiler can always do better if it has access to the same number of registers because it can look arbitrarily far ahead.
Oh, but you were trolling. You didn't actually expect an answer, did you? Well, you got one.
Check out my sci-fi/humor trilogy at PatriotsBooks.