Ars Dissects POWER5, UltraSparc IV, and Efficeon
Burton Max writes "There's an interesting article here at Ars about the POWER5, UltraSparc IV, and Efficeon CPUs. It's a self-styled "overview of three specific upcoming processors: IBM's POWER5, Sun's UltraSparc IV, and Transmeta's Efficeon. " I found the insights as to Efficeon (successor to Crusoe) to be particularly good (although it paints a sad picture of Transmeta, methinks)."
Too bad they focused too much on Power and Transmeta while paying little time on UltraSparc IV and V and ignored Itanium. Needs a little more balance and it would have been a great read.
I don't drink because I have to, I drink to stop the voices in my head!
Why the heck did Sun's offering get thrown in there? For variety? The Efficeons look awful nice to people who want less power-hunger from their computing devices. If all you do is word processing and such, why the heck even use an Intel/AMD chip? Less heat, less power, what is not to love? Now the IBM chips have really piqued my interest, I am a huge fan of IBM's chips, especially in Apple computers (I am a proud owner of a 12" Powerbook).
I hate sigs.
Will show up as _4_ processors to the OS! (2 cores both doing SMT.)
:o)
This means that in a (say) 512 processor box the OS will have to handle 2048 processors efficiently. That's placing a lot of control in the hands of the software designers, and a lot of money in the hands of the companies that license per processor.
On the other hand, UNIX is getting pretty efficnelt at scaling to large systems, perhaps it (and by extension Linux thanks to SGI and IBM) will be able to handle it with no problems. One thread per processor on a desktop system might prove to be quite efficient
Beep beep.
It's amusing seeing this. It reflects mostly that Microsoft has finally managed to ship in volume OSs that can do more than one thing at a time. (Bear in mind that most of Microsoft's installed base is still Windows 95/98/ME. Transitioning the customer base to NT/Win2K/XP has gone much more slowly than planned.)
But Microsoft takes the position that if have multiple CPUs, you have to pay more to run their software. So these strange beasts with multiple decoders sharing ALU resources emerge.
Wasn't low power consumption the number 1 benefit that transmeta was looking to provide, so that you could get twice the battery life (or soemthing like that) without sacrificing too much performance. Did Transmeta shoot itself in the foot by letting people think that it was going to provide higher performance chips than the competition.
The main selling point of transmeta was always power consumption, so have they lost their edge in that area? If so, then that would be serious for them, but the article doesn't answer that question.
Seems like the power5 will be able to run only two threads per core, like the pentium 4. For the P4 it is understandable that they want to reduce cost as much as possible, but why be so frugal on a high-end cpu like the power5?
I mean, the MTA supercomputer which pioneered the entire SMT concept, was able to run 128 threads per cpu. Ok, so they had different design constraints as well. Basically, the idea was that the cpu:s didn't have any cache at all thus making them simpler and cheaper. To avoid the performance hit usually associated with this they simply switched to another thread when one thread became blocked waiting for memory access.
Anyway, is there any specific reason why IBM didn't put more than 2, say 8 or 16 threads per cpu on the power5?
the author suggests that it's not worth "pissing off Intel" to go with Transmeta. Give me a break. Transmeta is the only thing pushing Intel to make Centrino and other lower-wattage chips. They recognize that anybody in the mobile computing/devices world will seriously consider anything that gives their customers increased battery life and less toasty pockets.
There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
Multiple times while reviewing the Efficion architecture the article's author suggests that the tradeoff of additional storage required for Transmeta's code-morphing approach will easily balance out the power savings from making a simpler CPU. This belies a deep misunderstanding of power consumption in digital systems, as readily evidences by the fact that modern non-Transmeta processers dissipate multiple tens of Watts of power (often nearly 100W) and a full complement of memory (4G, in modern machines) dissipates a few Watts at most.
Also in the article, the author suggests that processors spend most of their time wating on loads, and then argues that since the code-morphing approach means more instruction fetches, the Efficion processor will be spending disproportionatly more time on loads. Then, after this assertion, he admits that he does not know *where* the translated Efficion code is held. Might it be in one-cycle-accessible L1 cache? That point is conveniently sidestepped. He does not understand under what circumstances the profiling takes place, although he regurgitates the sales pitch nicely. He argues that transistors hold the translated code (trying to argue against the transistors-for-software tradeoff) but then does not realize that transistors in memory do not equate transistors in logic (neither in power, as they are not cycled as frequently, nor in speed characteristics).
In all, I find the author's treatment of the Transmeta architecture sophomoric, and, after finding that section lacking, I left the rest of the article unread. Your mileage may vary.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Interesting article indeed, yet there is a thing I on't quite understand about ILP (Instruction Level paralellism) :
If the number of decoded instructions is higher, then - the CPU being superscalar - the probability of having all pipelines working grows, which means that ILP's also going up.
Of course the ILP depends on the compiler quality and the program code itself, but having a good parallelism capacity in the CPU is also a key factor.
In fact, you could tell the story of the past 15 years of computer evolution -- from the rise of the PC to the rise of the Internet -- in terms of the effects of the amount of time it takes various components -- from a processor all the way out to a networked computer -- to load data.
I like this assessement. Forget about Moore's Law as a measure of our progress; latency and throughput are far more important than processing power.
Computers used to be for processing information; these days, most people use them more for accessing and delivering information. Every new computer I've gotten before my current one has only satisfied me by being faster than the ones that went before, not by actually being fast enough. However, my current machine (dual-1.25GHz Power Mac G4) leaves me with no complaints about speed--while I certainly wouldn't complain if it were a little faster, I never feel like I'm waiting for the computer for an unreasonable amount of time; most of the time, it's waiting for me.
However, when it's not waiting for me, it's waiting for one of its hard drives to spin up and feed it with data, or for some slow server to send it something. I would trade one of my processors for a 2x improvement in either disk or network latency. While these aren't the types of latency directly addressed in the article, I would wager that on the rare occasions when I actually have to wait for some processing to take place, most of that time is spent loading data from memory, not actually processing it.
It's not that processors are fast enough for everybody and we should forget about making them any faster; I'm sure graphics and video professionals, among others, will always have a need for more raw speed. But for most computer users, the continued emphasis on speed is misplaced. If computer manufacturers could transfer just a little bit of their R&D spending from increasing speed to decreasing latency, we'd all be better off.
I found the meaning of life the other day, but I had write-only access.