Ars Dissects POWER5, UltraSparc IV, and Efficeon
Burton Max writes "There's an interesting article here at Ars about the POWER5, UltraSparc IV, and Efficeon CPUs. It's a self-styled "overview of three specific upcoming processors: IBM's POWER5, Sun's UltraSparc IV, and Transmeta's Efficeon. " I found the insights as to Efficeon (successor to Crusoe) to be particularly good (although it paints a sad picture of Transmeta, methinks)."
Too bad they focused too much on Power and Transmeta while paying little time on UltraSparc IV and V and ignored Itanium. Needs a little more balance and it would have been a great read.
I don't drink because I have to, I drink to stop the voices in my head!
I had a brain fart for a second while reading the article:
"This is why the advances that have the most striking impact on the nature and function of the computer are the ones that move data closer to the functional units. A list of such advances might look something like: DRAM, PCI, on-die caches, DDR signaling, and even the Internet"
For a second there, I thought that the list of advances started with DRM, not DRAM, and I almost had a heart attack.
Since day 1 they have skirted the benchmark issue, always trying to deflect the question.
Just like that article yesterday on their new chip. Did they ever cite a single benchmark? NO.
The basic performance of your CPU product, as measured by industry standard benchmarks, is essential knowledge.
I was under NDA on the previous gen Transmeta stuff. It was amusing how the other OEMs reacted - it was crap, but nobody could say anything in public.
Why the heck did Sun's offering get thrown in there? For variety? The Efficeons look awful nice to people who want less power-hunger from their computing devices. If all you do is word processing and such, why the heck even use an Intel/AMD chip? Less heat, less power, what is not to love? Now the IBM chips have really piqued my interest, I am a huge fan of IBM's chips, especially in Apple computers (I am a proud owner of a 12" Powerbook).
I hate sigs.
LOL,I about spilled my soda.
What's this?
The owls are not what they seem
Will show up as _4_ processors to the OS! (2 cores both doing SMT.)
:o)
This means that in a (say) 512 processor box the OS will have to handle 2048 processors efficiently. That's placing a lot of control in the hands of the software designers, and a lot of money in the hands of the companies that license per processor.
On the other hand, UNIX is getting pretty efficnelt at scaling to large systems, perhaps it (and by extension Linux thanks to SGI and IBM) will be able to handle it with no problems. One thread per processor on a desktop system might prove to be quite efficient
Beep beep.
The history of Wintel suggests that top-rated raw CPU performance is not the best predictor of adoption. Compatibility with market-dominating software platforms is a greater determinant of CPU sales. We might hope that advances in compiler design adn flexible cores can help any CPU run x86 code, but there are always the little nts that prevent true compatibility and drive computer buyers toward the dominant platform.
Two wrongs don't make a right, but three lefts do.
It's amusing seeing this. It reflects mostly that Microsoft has finally managed to ship in volume OSs that can do more than one thing at a time. (Bear in mind that most of Microsoft's installed base is still Windows 95/98/ME. Transitioning the customer base to NT/Win2K/XP has gone much more slowly than planned.)
But Microsoft takes the position that if have multiple CPUs, you have to pay more to run their software. So these strange beasts with multiple decoders sharing ALU resources emerge.
Wasn't low power consumption the number 1 benefit that transmeta was looking to provide, so that you could get twice the battery life (or soemthing like that) without sacrificing too much performance. Did Transmeta shoot itself in the foot by letting people think that it was going to provide higher performance chips than the competition.
The main selling point of transmeta was always power consumption, so have they lost their edge in that area? If so, then that would be serious for them, but the article doesn't answer that question.
Seems like the power5 will be able to run only two threads per core, like the pentium 4. For the P4 it is understandable that they want to reduce cost as much as possible, but why be so frugal on a high-end cpu like the power5?
I mean, the MTA supercomputer which pioneered the entire SMT concept, was able to run 128 threads per cpu. Ok, so they had different design constraints as well. Basically, the idea was that the cpu:s didn't have any cache at all thus making them simpler and cheaper. To avoid the performance hit usually associated with this they simply switched to another thread when one thread became blocked waiting for memory access.
Anyway, is there any specific reason why IBM didn't put more than 2, say 8 or 16 threads per cpu on the power5?
the author suggests that it's not worth "pissing off Intel" to go with Transmeta. Give me a break. Transmeta is the only thing pushing Intel to make Centrino and other lower-wattage chips. They recognize that anybody in the mobile computing/devices world will seriously consider anything that gives their customers increased battery life and less toasty pockets.
There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
Multiple times while reviewing the Efficion architecture the article's author suggests that the tradeoff of additional storage required for Transmeta's code-morphing approach will easily balance out the power savings from making a simpler CPU. This belies a deep misunderstanding of power consumption in digital systems, as readily evidences by the fact that modern non-Transmeta processers dissipate multiple tens of Watts of power (often nearly 100W) and a full complement of memory (4G, in modern machines) dissipates a few Watts at most.
Also in the article, the author suggests that processors spend most of their time wating on loads, and then argues that since the code-morphing approach means more instruction fetches, the Efficion processor will be spending disproportionatly more time on loads. Then, after this assertion, he admits that he does not know *where* the translated Efficion code is held. Might it be in one-cycle-accessible L1 cache? That point is conveniently sidestepped. He does not understand under what circumstances the profiling takes place, although he regurgitates the sales pitch nicely. He argues that transistors hold the translated code (trying to argue against the transistors-for-software tradeoff) but then does not realize that transistors in memory do not equate transistors in logic (neither in power, as they are not cycled as frequently, nor in speed characteristics).
In all, I find the author's treatment of the Transmeta architecture sophomoric, and, after finding that section lacking, I left the rest of the article unread. Your mileage may vary.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Well, I tried to read the FA before making a comment, but it was futile, there was some enormous flashing strobe-y advert to it that was just painful to have on the screen.
So, I gave up. I have no clue what the advert was for, it had a sort of minimalist man icon in it, and lots of flashing colours - that's all I know. I do however know a lot more about advertising than the idiots who thought that one up.
Simon.
Physicists get Hadrons!
first, you don't just automatically get a linear increase with the width of the multiple-threading capabilities. it's not like it's free to increase the RF size and/or FUs, etc.
you're also confusing contexts with active threads. the Tera^WCray MTA had 128 contexts available -- so that thread switching is more light-weight, more or less -- but only one could be active at one time.
SMT in the various forms have more than one active thread, which introduces the problem(s) of competing for resources in the issue and retire stages, etc et al.
For one thing, there's a lot of interconnects inside that cement block, so it's not just like exposed pins for all the chips on the other side. For another... how often do chips die? If you can afford a machine with one of those chips, you can probably afford to replace a whole brick.
autopr0n is like, down and stuff.
Interesting article indeed, yet there is a thing I on't quite understand about ILP (Instruction Level paralellism) :
If the number of decoded instructions is higher, then - the CPU being superscalar - the probability of having all pipelines working grows, which means that ILP's also going up.
Of course the ILP depends on the compiler quality and the program code itself, but having a good parallelism capacity in the CPU is also a key factor.
It read much like a financial review of a company. Take the buzz words, guess wildy, base predictions of your guesses. Granted the author was intellegent and understood the basics, but with out a deeper understanding of the specifics he cannot really give reasons for performance or lack their of.
Well.. maybe. Or Maybe not. But Definitely not sort of.
But, it's a chip PLATFORM that depends on certain Pentium-M chips. Naturally, systems built around Transmeta "chips" will also require Transmeta-compatible support devices (e.g. the "Transmeta PLATFORM").
My point is, this low-voltage thing was a non-issue before Transmeta came along. Intel just told everyone to "put bigger fans" in their laptops and shut up. I've got this Dell with seriuosly huge fans, and it gets HOT (but it's pretty durn fast, has a big screen and built in DVD/CD-RW). I don't need low-voltage because it mostly sits on my desk.
There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
..and why is he taking UltraSparcs appart anyway?
We suffer more in our imagination than in reality. - Seneca
One detail that they didn't mention was the integrated AGP and DDR memory controller on Efficeon. Blades don't use graphics, so I'm thinking that Efficeon was designed primarily for Japanese laptops.
Efficeon allows for a low chip count design. That could mean a smaller and more reliable laptop design.
I actually read the article!!!!!
All my questions were answered so I have nothing to say.
Ok, so you are worried that your parts are no longer accessable.
:-) The next step is to somehow connect multiple (4/8/16/whatever) CPUs along with large (multi-meg) caches together using these specialized interconnect technologies.
One of the first computers I built had individual TTL parts (74xx type things) to make the CPU. If I fried on of those, I would just replace that single part and be going again. No need to replace the whole CPU.
I, for one, would never go back to that. Not just the size but the performance and the cost.
It used to be that I would buy 4K-bit RAM chips. Buy 8 of those to make a 8x4K RAM array (4K bytes) and then add a simple address decoder and put it on an S100 bus and you have more RAM for you system. Now I buy 512Meg DIMM modules where you can't (and don't want to) replace the individual chips. (Ok, you could if you have the fancy tools and you could get the chips in question but the cost factors just don't make that worth while)
Systems are getting faster because of the higher integration levels. Taking the off-chip caches (like systems build with 386 and 486 CPUs) and putting that onto the CPU (first in the P-Pro as multi-chip modules and then later, onto the CPU itself) has significantly improved performance. Yes, it has removed ability to replace the cache separate from the CPU but then who really wants to or needs to do that. And at what cost (go back to 100MHz cache memory interfaces? I like the 3GHz clock in my on-chip cache, thank you very much.)
The same is true of multi-cpu systems. As you increase the performance, the communications performance becomes a major bottle neck. First IBM put two CPU cores on one chip. Then Intel did a thing called Hyper-Threading (after they said that dual cores have no value
Imagine the performance gain of having 1GHz+ clocked cache of, say, 256Meg connected to 8 really fast CPU cores. It would be as much of a step forward as going from my TTL based 8-bit CPU to the 6502 single-chip CPU.
I know I would not want to go back... So lets investigate how to move forward.
I don't understand why Transmeta still comes up in conversation. Besides the fact that they hired Linus, what exactly have they done to merit this inclusion alongside IBM, Sun, and Intel? There are plenty of other CPU manufacturers that sell x86 clones now... I think Cyrix was bought by some Taiwanese fab plant company, weren't they?
Until Transmeta becomes a real contender, let's just keep out of the Linux biases and concentrate on the real contenders.
My prediction is that if they don't produce a real hit soon, they will be out of business in 2 years.
MS has shipped preemptive multitasking and multithreading for a long time. You are confusing that with multiprocessing (which is different).
Win95/98/ME are not multiprocessor but are preemptive multitasking and multithreading. They can certainly do "more than one thing at a time". Unlike Apple who first shipped this capability only recently, MS first shipped this in Windows 386 back in the late 80's.
There isn't any significant adoption of Transmeta processors because they suck. You only find them primarily in the smallest Japanese machines only. Even there the Pentium-M is pushing them out.
Claiming MIPS/watt supremacy for Transmeta is questionable as well.
we run a large FoxPro cluster
Is that not the saddest form of life you've ever heard of?
How do you know I'm not black?
Senior CPU Editor | Ars Technica | http://arstechnica.com/
Since the author of this article is lurking here, I thought I'd ask:
You make a rather big deal about Transmeta needing to run all x86 code through a "code morpher" (dynamic recompiler, actually), and come up with a decently large set of conclusions based on it.
What's the big deal? No processor executes raw x86 anymore. Everything translates into an internal microcode that bears little resemblance to the original asm. Of course, normal chips have hardware accelerated microcode translaters, whereas Transmeta must recode in software -- but Transmeta's entire architecture was designed from day one to do that, and concievably they have more context available to do recoding by involving main memory in the process.
And what is it with you neglecting the equivalence of main memory? Yes, transistors are necessary to store the translated program. They're also necessary to store the original one -- the Mozilla client I'm presently tapping away inside sure as hell doesn't fit in L1 on my C3! Outside of a small static penalty on load, and a smaller dynamic penalty from ongoing profiling, you can't blame performance on the fact that software needs to be in RAM. Software always needs to be in RAM.
Don't get me wrong -- Transmeta's a performance dog, and everyone's known that since day one. But I think it's reasonable to say the cause is mostly one of attention -- every man hour they threw into allowing the system to emulate x86 took away from adding pipelines, increasing clock rates, tweaking caches, etc. In other words, yes it's a feat that they got the code to work, but you don't need to blame the feat for the quality of work -- they simply did alot of work nobody else had to waste time on, and fell behind because of it.
Much easier explanation. Might even be true.
Yours Truly,
Dan Kaminsky
DoxPara Research
http://www.doxpara.com
My understanding of 95/98/ME is that it wasn't truly preemptive either, at least not at all levels. Perhaps this is why this branch got progressively more stable? More code was either made to cooperate or shifted to runtime environments where it truly was preempted?
OS/2, NT, 2000 and XP are truly preemptive as are all versions of Unix that I've ever heard of.
You're right about Apple though, it took them way too long to get to this.
unlike the other x86 knockoff manufacturers they have actually attempted something somewhat new and different in their designs. They may not have met with a roaring success marketwise but they certainly did try to attack things from a different angle. The point of the article seems to be comparing the somewhat different aproaches the various cpu makers took in their designs, not how many millions of chips they have sold or billions of dollars they have in the bank.
In fact, you could tell the story of the past 15 years of computer evolution -- from the rise of the PC to the rise of the Internet -- in terms of the effects of the amount of time it takes various components -- from a processor all the way out to a networked computer -- to load data.
I like this assessement. Forget about Moore's Law as a measure of our progress; latency and throughput are far more important than processing power.
Computers used to be for processing information; these days, most people use them more for accessing and delivering information. Every new computer I've gotten before my current one has only satisfied me by being faster than the ones that went before, not by actually being fast enough. However, my current machine (dual-1.25GHz Power Mac G4) leaves me with no complaints about speed--while I certainly wouldn't complain if it were a little faster, I never feel like I'm waiting for the computer for an unreasonable amount of time; most of the time, it's waiting for me.
However, when it's not waiting for me, it's waiting for one of its hard drives to spin up and feed it with data, or for some slow server to send it something. I would trade one of my processors for a 2x improvement in either disk or network latency. While these aren't the types of latency directly addressed in the article, I would wager that on the rare occasions when I actually have to wait for some processing to take place, most of that time is spent loading data from memory, not actually processing it.
It's not that processors are fast enough for everybody and we should forget about making them any faster; I'm sure graphics and video professionals, among others, will always have a need for more raw speed. But for most computer users, the continued emphasis on speed is misplaced. If computer manufacturers could transfer just a little bit of their R&D spending from increasing speed to decreasing latency, we'd all be better off.
I found the meaning of life the other day, but I had write-only access.
Cooperative multitasking means nothing. PC's running DOS had that for disk IO. All versions of Windows prior to 95 had that as did alternative multitasking systems for PC's in the mid 80's.
PC's gained multithreading with OS/2 v 1.0. Windows first got it with NT 3.1 then later with Win95.
Using street slang makes you sound juvenile. Black doesn't mean uneducated but blacks that say "hater" sound just as stupid as other races.
Seriously. Every other day there's an Ars Technica this, an Ars Technica that. Let's make an icon for it.
As far as Windows 9x being "less preemptive" than NT I was refering to the clearer distinction between user and kernel in NT. Obviously the kernel can't be preempted during a certain times, but with 9x, my understanding was that user programs sometimes ran pretty much like parts of the kernel. Coming from the DOS legacy, where there was no distinction between user and kernel.
I also don't know if I'd agree that a critical section means that a process can't be preempted. The definition I'd use has to do with guarding a resource. In fact, a recipe for a deadlock is when one process enters a critical section for a resource and is preempted before it releases it, probably as it tries to get yet another resource. When another process tries to get the two resources in the opposite order, you have a deadlock. Maybe you mean something different by critical section.
Wouldn't it just be better if we had computers with lots of tiny CPU cores, instead of such big mamooths like the POWER5 or the Ultrasparc V ? for example, an array of 256 32 bit CPUs would make life simpler and more efficient at the hardware level as well as the software level, wouldn't it be?
By the way, I would like to have a computer that has SRAM only and a bandwidth of 100 GB/sec...Is it possible, with current technology ?
You jive turkey, you got to sass it.