Sun makes money selling systems. They "add-value" to their hardware by providing good software a.k.a. Solaris. Their hardware, by itself, is not very impressive.
Now, if Sun made Solaris on x86 just as good or better than Solaris on SPARC, then that would seriously de-value their hardware-software package. It would be death.
Sun is running out of options. Commoditization is moving up the enterprise stack on both the hardware (x86) and software (Linux) sides. Big enterprise apps such as Oracle run on Linux and IBM is adding more and more enterprise features to Linux. On the hardware side, Intel and AMD are utterly destroying SPARC with huge economies of scale advantages.
It will be extremely difficult for Sun to survive in this harsh competitive environment. The squeeze is on from the cheap x86 boxes on the low end and the killer IBM boxes on the high end. Their software is becoming less and less relevant as linux matures in the enterprise thanks to IBM and others.
Not true. Any multithreaded app may NOT show a performance improvement on an SMT machine.
You can have 2 or more threads that want a shared resource (like the cache) that thrash each other leading to a degradation in performance vs. a single threaded processor.
Other resources are typically shared like front-end bandwidth, execution bandwidth etc...
The threads can be NON-symmetric. It's SMT (Simultaneous multithreading).
Because most websites are optimized/coded to work with IE, I like the fact the the IE renderers the pages as the website designer intended.
But I hate the stupid pop-ups and spyware. I also like tabbed browsing. My solution is to use MyIE2 which embeds the IE renderer into a tabbed-based, popup-free, google-toolbar supporting framework. It's totally free.
I work for Intel and get paid to design these x86 CPUs. I also know the Alpha designers because they too now work for Intel. In the past, I have worked on SPARC CPUs and MIPS CPUs. Therefore I know more than realworldtech.com
I can assure you that X86 is complicated and needs a lot more design/validaiton effort, but this does not translate into more than 5-10% area and power compared to a RISC. It DOES translate into far greater design effort.
Life would be easier if we had to design an Alpha. But remember that most prior Alphas were also full custom designs. AMD and Intel can't afford to be totally full custom anymore - the CPUs are just too complex now.
A) Pain in the ass legacy doesn't cost much power/area - it's just mostly microcode and lots of time in validation which impacts time-to-market which I guess impacts speed since the longer you wait, the more mature the process becomes which translates into lost opportunity.
B) The strong ordering model is used in most software - even the RISCs. For example, the most widely used memory model in SPARC is TSO (total-store-order) as opposed to RMO (relaxed memory order). Nothing is stopping Intel/AMD from defining an RMO-like memory consistency model. Good luck convincing software people to program it!
C) Lack of registers is partially addressed by 2 things. X86-64 makes the jump from 8->16 which is quite good. Spill/Fill loads and stores can be removed from the critical path with memory renaming and fast-store forwarding. You still need to decode and rename the loads and stores which is overhead. The lack of triadic operations is also troublesome, but extending the ISA is not impossible.
D1) The trace cache holds fewer micro-ops compared to RISC -- BUT! You get to go past taken-branches which increases your front-end bandwidth as well as reduces your latency. High-bandwidth and low latency are the key benefits.
D2) Yes the 3 decoders burn power. This is a cost of x86. You can remove the high power by caching decoded ops through a trace cache or a less heavy-weight method like a small loop cache. There are various degrees of caching that can reduce decode power significantly. By your admission, caches are a low power item -- they are used extensively.
E) Parallelism through the ISA is difficult for a compiler to expose. It's the dynamic hardware scheduler that gets you most of the performance. Things like load-latency are not predictable at compile-time so the compiler has no idea how to schedule instructions for that irrespective of the ISA. That's why everybody builds out-of-order machines for high-performance. That's why Itanium sucks for non-predictable load latency code (ie - just about all of SPECint).
F) Trace-cache is low power - it's a cache. The decode only happens infrequently on trace-cache-miss so it's power impact is low (the bandwidth is also lower since it does not need to be fast -- only trace cache hit needs to be fast). Exception models are a pain in the ass but do not cost power or area in any significant way. Just more validation and design time. Memory renaming isn't even that big of a win when compared to competent store forwarding. All RISCs except for Alpha have flags. Big deal, we just rename them like everybody else. What's the big deal? It's just another really small source operand (it's not like 64 bits wide or anything).
G) Clock speed is hurt? Excuse me, but x86 processors have the highest clock speeds around. You can't use economy of scale arguments because AMD has 2.4+ GHz CPUs out on the market in 130 nm tech. and they are hardly a giant.
Leakage power is a problem for RISC and CISC processors. This is a problem for which solutions exist at the circuit and process level.
You CAN'T compare a G4 to a VIA. The G4 is an out-of-order machine designed by a team of hundreds. The VIA was designed by Centuar in Austin by some 30+ engineers and is a simple IN-ORDER core. If you want to normalize on design effort (full custom vs. standard-cell vs. asic), compare and normalize engineering man-years.
In conclusion - x86 is a PAIN IN THE ASS to design and validate, but the end product overhead in terms of die area and power are not that high. Maybe 5-10% on both count.
Having billions in the bank does not guarantee the your investment in the future is secure. Take a look at what happened to Sun. Investing billions into a low-volume server market would have made DEC look like Sun which is hurting from competition from Xeons and Opterons. Sorry, but even in the enterprise, hardware components are becoming commodities. It's all about the services and software.
Paul Demone (from realworldtech) does not get paid to design CPUs. He gets paid to speculate about "paper" machines.
By the way, I have been paid to design CPUs for the last 8 years and I continue to do so. Taking a design from concept to reality involves MUCH more than a paragraph of speculation by armchair-website architects like Paul.
To counter your assertions...
2.5 GHz is about what AMD is running today. It's not some magically high frequency. It's what traditional process scaling would get you.
Multiple cores are something everybody is doing.
Intel brought SMT to the world before EV8 even would have taped out.
Memory bandwidth is NOT a CPU design problem, it's a system design problem which is just solved with money - More pins and wires...duh! There is no innovation there. Hell NVIDIA sells a mass market 40GB/sec memory bandwidth solution which is almost 2x the bandwidth of this EV8 system.
So what you're saying is that EV8 would be a wider out-of-order core with SMT. Hmm...sounds like every other commodity x86 CPU out there (Opteron/Xeon). Tell me again how DEC would have made billions?
Memory isnt a problem its cheap and you have a lot of it. Alpha processor were designed to be fast, fastest of the world in computation. So the point of size of binary isnt important on calculus server like alpha.
Memory size (as in DRAM) isn't as important since DRAM is cheap, but the working set that fits in the caches is extremely important because cache real-estate on processor silicon is very expensive. Go price out the Intel extreme edition P4 w/ a 2M cache and the regular Prescott P4 with a 1M cache. Cost is exponential with die size.
Bandwidth is also very expensive - You can see this with the premium being charged for 800MHz bus systems vs 533MHz bus systems.
If your code does not fit in your caches, you're going to take more cache misses which will degrade performance. With a more efficient encoding of the program, this is less likely.
Make no mistake about it, if DEC management had believed in Alpha technology as much as the rest of the people in the company, and DEC had kept the FAB plants and invested in them as they had originally planned to do, and there had been no Comaq buy out, you would today be looking at SMT Alpha EV8 chips running somewhere around the speeds of todays Pentium chips.. and NOTHING Intel, IBM or anyone else could product would have even come close to touching it. It wasn't any technology shortcoming that killed Alpha, just bad management heaped on bad management heaped on even more bad management.
Do you hear what you are saying? If all of these miraculous and EXPENSIVE events occurred, you'd have your SMT EV8 running "at the speed of today's Pentiums" So you'd have a super expensive chip running at the same speed as a cheap commodity chip?
Oh right -- this is DEC with their brilliant innovation.
They created a high-frequency, low-ipc design which is quite inefficient - why, AMD can do better by doing more per clock! And this "virtual processor" SMT thing is only a hack that they were forced to include because their pipeline was so inefficient. Why would they have such a high frequency? It must be marketing...
I wish the EV8 did come out so that it would silence all of the clueless Pentium 4 critics:)
I would consider AMD slightly above a niche because they force Intel to react to their moves, even though they hover around break-even when it comes to making money - You can see that their stock price for the last couple of decades has literally gone nowhere while Intel's has increased exponentially.
2) Complicated instruction decode can be removed from the critical circuit paths with pre-decoded caches. On one extreme, AMD uses predecode bits to mark where instructions begin in the i-cache. On the other extreme, Intel caches the fully decoded micro-ops in their trace-cache. When the variable length decode is out of the critical path, it can be made slower and therefore smaller.
I don't know where you get your "half" numbers from, but I can assure you that the x86 overhead is nowhere close to "half". There is MAYBE 5-10% overhead in power/area. Most of the non-cache transistors in modern x86 CPUs go towards the out-of-order control logic (re-order buffers, schedulers, highly-ported register files, memory ordering buffers etc...) which attempt to extract instruction level parallelism from the program. High performance CPUs need this logic whether they are RISC or not.
Another note -- Variable length instructions more efficiently encode your program so you don't need as big of an i-cache or as much bandwidth to the i-cache as a RISC processor. It's not all bad. Compile something on x86 and then cross compile it to some RISC processor and tell me how much bigger your binary is...
Instruction sets are not where performance comes from. Circuit technology and underlying microarchitecture are FAR bigger components to performance and how much power your chip burns.
It's funny how Alpha gets such praise for taking a high-power high-heat high-frequency approach (low-IPC) to processor design. Compare the Alphas of their day to the HP PA-RISC offerings which were lower frequency but higher IPC. It reminds me of the Intel vs. AMD battles. There was a famous Microprocessor Report that pitted the speed-demons (Alphas) against the Brainiacs (PA-RISC).
When Intel tries the same Alpha design style, they get buried in the press. I guess it's natural to fight the leader and go for the underdogs - but Alpha was the top dog and never did receive the criticism that Intel does.
ARM-based processors number about 400 million per year, last estimate I saw. The general number for microprocessors is around 4.7 *billion*. Most of which aren't x86.
Granted.
I should have said something about dollar volume instead of unit volume. Intel makes more money from CPUs than anyone else.
IBM is a niche. Sun is a niche. Alpha, even in it's glory days, was a niche. AMD has 15-20% of the x86 market and is just slightly larger than a niche.
Intel ships 1 million Prescotts a week(http://www.xbitlabs.com/news/cpu/display/2004 0512151634.html). This is not even full production capacity. This all done in 90nm technology -- a full 6 months ahead of anyone else. There were on the order of hundreds of millions of Northwoods sold and they are still selling.
That's probably more volume in a single week than the entire IBM + Sun + Alpha volume for an entire year.
Why is this the case? It is RIDICULOUSLY expensive to manufacture CPUs in this day and age. If you DON'T ship on the order of 1 million a week, you will never recover the costs necessary to build the all of the fabs.
This is why Sun will eventually abandon SPARC. This is why IBM loses money in their microelectronics division, but will probably maintain POWER and eat the costs for strategic reasons. This is why HP/SGI and others have gone with Itanium.
This is not to discount the technical acheivments of the these CPUs. I design processors for a living and have great respect for the Alpha design team. But at the end of the day, the only reason someone is going to fund the design a computer is to make money. Only the profitable survive.
They are there only to address imperfections in the storage media, not as a part of the fundamental design of the computer. I'd bet CPU designers would love to be able to throw out the caches entirely and address main memory with no middle-men. Cache is to CPUs as connection pools are to databases or as JavaScript is to interactive websites (a means to an end, but not the ideal path).
I understand what you're saying, and I can tell the you probably have a CS and/or Math background. Your comment makes sense in a highly idealized abstract sense, but you can't idealize away things like the laws of physics.
You see, the speed of light can only travel about 3 centimeters in a nanosecond. Latency is fundamentally limited by the speed of light. You can't get better than that unless all of the laws of physics we have discovered are wrong.
There are fundamental physical limits to computer design. I suggest reading "The Physics of Information Technology". Look it up on google and/or amazon etc...
This is the real compairson. Overclock the AMD to 3.6GHz and see who wins. As soon as AMD gets tthe 90nm process perfected I think we will see a huge boost in AMDs clockspeed.
This always annoys me...
You see, you can't buy an AMD at 3.6 GHz because it wasn't designed to run that fast. The AMD does more work per clock so it CAN'T run at 3.6GHz in 90nm. It is simply not designed to do so. The laws of physics prevent this.
The Intel CPU CAN run at 3.6GHz because it was DESIGNED to run at 3.6GHz AT THE COST of doing LESS work per clock.
If I had a CPU that could execute 2 instructions per clock at 1 GHz and another CPU that could execute 1 instruction per clock at 2 GHz, they would have the exact same performance.
They are different design styles. Sometimes the high frequency, lower-IPC approach is better, sometimes the lower frequency, higher-IPC approach is better. You can see this in the discrepency in performance of 2 CPUs with vastly different design tradeoffs.
I'd bet they put it nowhere. L2 and L3 caches are a kludge, and, if they really achieve huge chip-to-chip bandwidth, they just might not need the cache hierarchy. This is reminiscent of old CPUs, where the system RAM ran at an acceptably large fraction of the speed of the CPU, so there was no L2 cache at all.
L2 and L3 caches are not a kludge. They work to reduce average latency of loads and stores. You can solve the bandwidth problem with better interconnects, but on-die caches will still be there for latency reduction.
Speed IS scaling as expected. It's just that we are now living in a power-limited world.
There are so many damn transistors now on the chip that switching them all on and off at the same time draws a tremendous amount of power.
The ability for a system to remove heat from a chip is limited. The costs involved in cooling anything over 150 watts is prohibitive for the volume market.
So, yes if you had infinite money for cooling solutions then you would see speed scale with process.
Re:Changed the view of the US?
on
Bobby Fischer Found
·
· Score: 2, Insightful
You know, if we had recruiters for Pharmaceuticals standing outside of colleges offering new graduates 10.2 million over 3 years, then cancer would have been cured 10 years ago. Why do athletes, that contribute NOTHING to society, get paid the most?
Maybe because curing cancer is several orders of magnitude more difficult than hitting 40 home runs in a season.
The financial rewards are there -- multi-billion dollar rewards await the people that cure cancer. These rewards far exceed what any athlete could ever make.
Putting up million dollar rewards to solve problems like the Hilbert mathematics problems haven't yet yeilded any solutions.
Athletes contribute entertainment value to society and are compensated at the rate the market will bear.
Why not type it in non-caps and then highlight the region of text and make it caps later. Much like 'bold' or 'italics' in word processors. Bold/italic/underline are useful too but you don't see people screaming for bold/italic/underline keys.
One video card company cannot gain more than 50% of today's market. It's just not possible.
Why not? Technology companies routinely evolve into dominant market share players. Intel has an 83% share of all CPUs. Microsoft has a 99% share of operating systems.
There are only 2 players in the video card market now - others have died trying to fight the big 2. If one can sustain a competitive advantage over the other for a few years, one will end up being dominant (ie - have more money than the other guy to invest in newer products to remain competitive).
The rest of the datacenter might be things like Sun 6500's, 10Ks, or holy shit, a 15K or two. What fills the gap here? I'm starting to see more and more large IBM servers moving in. I guess IBM is really going to capitalize?
It's true that Sun is getting killed by IBM at the high end and Linux/Windows/x86 on the low end. Their cost structure is way too expensive. Their business model is broken. They're getting squeezed into irrelevance.
If Sun wants to survive, they need to start building systems around x86. If I were in charge, I would immediately kill SPARC and start building scalable hardware with Opterons. Sun has some very specific systems and software engineering expertise that they should use to build these enterprise systems and differentiate themselves from the competition.
They should also build a SPARC->x86 dynamic binary translator to ease the migration path to the new ISA.
Another interesting point is that the SPARC64-V was made almost exclusively by native (Japanese) engineers. Fujitsu, as a matter of traditional Japanese corporate policy, does not hire H-1B workers.
I'm calling bullshit on this one. The SPARC64-V architecture was created at HAL corporation which was based in the Bay Area but owned by Fujitsu - Most of the engineers were in fact not Japanese. The chief architect was Mike Shebanow who went to AMD after HAL and then left AMD recently for NVIDIA I believe.
Sun's processor architecture team sucks because they are clueless when it comes to high performance processor design. The UltraSPARC IV is still an in-order machine! It's 2004 for Christ's sake and they can't figure out how to build a competitive out of order processor.
Your H1B comment is a red herring. Intel, AMD and IBM hire tons of H1B workers too, and their processors kick Sun's ass.
Sun makes money selling systems. They "add-value" to their hardware by providing good software a.k.a. Solaris. Their hardware, by itself, is not very impressive.
Now, if Sun made Solaris on x86 just as good or better than Solaris on SPARC, then that would seriously de-value their hardware-software package. It would be death.
Sun is running out of options. Commoditization is moving up the enterprise stack on both the hardware (x86) and software (Linux) sides. Big enterprise apps such as Oracle run on Linux and IBM is adding more and more enterprise features to Linux. On the hardware side, Intel and AMD are utterly destroying SPARC with huge economies of scale advantages.
It will be extremely difficult for Sun to survive in this harsh competitive environment. The squeeze is on from the cheap x86 boxes on the low end and the killer IBM boxes on the high end. Their software is becoming less and less relevant as linux matures in the enterprise thanks to IBM and others.
Not true. Any multithreaded app may NOT show a performance improvement on an SMT machine.
You can have 2 or more threads that want a shared resource (like the cache) that thrash each other leading to a degradation in performance vs. a single threaded processor.
Other resources are typically shared like front-end bandwidth, execution bandwidth etc...
The threads can be NON-symmetric. It's SMT (Simultaneous multithreading).
Because most websites are optimized/coded to work with IE, I like the fact the the IE renderers the pages as the website designer intended.
But I hate the stupid pop-ups and spyware. I also like tabbed browsing. My solution is to use MyIE2 which embeds the IE renderer into a tabbed-based, popup-free, google-toolbar supporting framework. It's totally free.
This way, I get the best of both worlds.
I guess we'll have to disagree :)
I work for Intel and get paid to design these x86 CPUs. I also know the Alpha designers because they too now work for Intel. In the past, I have worked on SPARC CPUs and MIPS CPUs. Therefore I know more than realworldtech.com
I can assure you that X86 is complicated and needs a lot more design/validaiton effort, but this does not translate into more than 5-10% area and power compared to a RISC. It DOES translate into far greater design effort.
Life would be easier if we had to design an Alpha. But remember that most prior Alphas were also full custom designs. AMD and Intel can't afford to be totally full custom anymore - the CPUs are just too complex now.
A) Pain in the ass legacy doesn't cost much power/area - it's just mostly microcode and lots of time in validation which impacts time-to-market which I guess impacts speed since the longer you wait, the more mature the process becomes which translates into lost opportunity.
B) The strong ordering model is used in most software - even the RISCs. For example, the most widely used memory model in SPARC is TSO (total-store-order) as opposed to RMO (relaxed memory order). Nothing is stopping Intel/AMD from defining an RMO-like memory consistency model. Good luck convincing software people to program it!
C) Lack of registers is partially addressed by 2 things. X86-64 makes the jump from 8->16 which is quite good. Spill/Fill loads and stores can be removed from the critical path with memory renaming and fast-store forwarding. You still need to decode and rename the loads and stores which is overhead. The lack of triadic operations is also troublesome, but extending the ISA is not impossible.
D1) The trace cache holds fewer micro-ops compared to RISC -- BUT! You get to go past taken-branches which increases your front-end bandwidth as well as reduces your latency. High-bandwidth and low latency are the key benefits.
D2) Yes the 3 decoders burn power. This is a cost of x86. You can remove the high power by caching decoded ops through a trace cache or a less heavy-weight method like a small loop cache. There are various degrees of caching that can reduce decode power significantly. By your admission, caches are a low power item -- they are used extensively.
E) Parallelism through the ISA is difficult for a compiler to expose. It's the dynamic hardware scheduler that gets you most of the performance. Things like load-latency are not predictable at compile-time so the compiler has no idea how to schedule instructions for that irrespective of the ISA. That's why everybody builds out-of-order machines for high-performance. That's why Itanium sucks for non-predictable load latency code (ie - just about all of SPECint).
F) Trace-cache is low power - it's a cache. The decode only happens infrequently on trace-cache-miss so it's power impact is low (the bandwidth is also lower since it does not need to be fast -- only trace cache hit needs to be fast).
Exception models are a pain in the ass but do not cost power or area in any significant way. Just more validation and design time.
Memory renaming isn't even that big of a win when compared to competent store forwarding.
All RISCs except for Alpha have flags. Big deal, we just rename them like everybody else. What's the big deal? It's just another really small source operand (it's not like 64 bits wide or anything).
G) Clock speed is hurt? Excuse me, but x86 processors have the highest clock speeds around. You can't use economy of scale arguments because AMD has 2.4+ GHz CPUs out on the market in 130 nm tech. and they are hardly a giant.
Leakage power is a problem for RISC and CISC processors. This is a problem for which solutions exist at the circuit and process level.
You CAN'T compare a G4 to a VIA. The G4 is an out-of-order machine designed by a team of hundreds. The VIA was designed by Centuar in Austin by some 30+ engineers and is a simple IN-ORDER core. If you want to normalize on design effort (full custom vs. standard-cell vs. asic), compare and normalize engineering man-years.
In conclusion - x86 is a PAIN IN THE ASS to design and validate, but the end product overhead in terms of die area and power are not that high. Maybe 5-10% on both count.
Having billions in the bank does not guarantee the your investment in the future is secure. Take a look at what happened to Sun. Investing billions into a low-volume server market would have made DEC look like Sun which is hurting from competition from Xeons and Opterons. Sorry, but even in the enterprise, hardware components are becoming commodities. It's all about the services and software.
Paul Demone (from realworldtech) does not get paid to design CPUs. He gets paid to speculate about "paper" machines.
By the way, I have been paid to design CPUs for the last 8 years and I continue to do so. Taking a design from concept to reality involves MUCH more than a paragraph of speculation by armchair-website architects like Paul.
To counter your assertions...
2.5 GHz is about what AMD is running today. It's not some magically high frequency. It's what traditional process scaling would get you.
Multiple cores are something everybody is doing.
Intel brought SMT to the world before EV8 even would have taped out.
Memory bandwidth is NOT a CPU design problem, it's a system design problem which is just solved with money - More pins and wires...duh! There is no innovation there. Hell NVIDIA sells a mass market 40GB/sec memory bandwidth solution which is almost 2x the bandwidth of this EV8 system.
So what you're saying is that EV8 would be a wider out-of-order core with SMT. Hmm...sounds like every other commodity x86 CPU out there (Opteron/Xeon). Tell me again how DEC would have made billions?
Memory isnt a problem its cheap and you have a lot of it. Alpha processor were designed to be fast, fastest of the world in computation. So the point of size of binary isnt important on calculus server like alpha.
Memory size (as in DRAM) isn't as important since DRAM is cheap, but the working set that fits in the caches is extremely important because cache real-estate on processor silicon is very expensive. Go price out the Intel extreme edition P4 w/ a 2M cache and the regular Prescott P4 with a 1M cache. Cost is exponential with die size.
Bandwidth is also very expensive - You can see this with the premium being charged for 800MHz bus systems vs 533MHz bus systems.
If your code does not fit in your caches, you're going to take more cache misses which will degrade performance. With a more efficient encoding of the program, this is less likely.
Make no mistake about it, if DEC management had believed in Alpha technology as much as the rest of the people in the company, and DEC had kept the FAB plants and invested in them as they had originally planned to do, and there had been no Comaq buy out, you would today be looking at SMT Alpha EV8 chips running somewhere around the speeds of todays Pentium chips .. and NOTHING Intel, IBM or anyone else could product would have even come close to touching it. It wasn't any technology shortcoming that killed Alpha, just bad management heaped on bad management heaped on even more bad management.
:)
Do you hear what you are saying? If all of these miraculous and EXPENSIVE events occurred, you'd have your SMT EV8 running "at the speed of today's Pentiums" So you'd have a super expensive chip running at the same speed as a cheap commodity chip?
Oh right -- this is DEC with their brilliant innovation.
They created a high-frequency, low-ipc design which is quite inefficient - why, AMD can do better by doing more per clock! And this "virtual processor" SMT thing is only a hack that they were forced to include because their pipeline was so inefficient. Why would they have such a high frequency? It must be marketing...
I wish the EV8 did come out so that it would silence all of the clueless Pentium 4 critics
And most of the Alpha design team is largely intact and now working for Intel. All of the superstar processor designers now work for Intel or AMD.
They will pull away from IBM, Sun and the rest.
Like ir or not, we're converging to a one ISA world -- x86 everywhere in 10 years.
I would consider AMD slightly above a niche because they force Intel to react to their moves, even though they hover around break-even when it comes to making money - You can see that their stock price for the last couple of decades has literally gone nowhere while Intel's has increased exponentially.
1) The Alpha is also little endian.
2) Complicated instruction decode can be removed from the critical circuit paths with pre-decoded caches. On one extreme, AMD uses predecode bits to mark where instructions begin in the i-cache. On the other extreme, Intel caches the fully decoded micro-ops in their trace-cache. When the variable length decode is out of the critical path, it can be made slower and therefore smaller.
I don't know where you get your "half" numbers from, but I can assure you that the x86 overhead is nowhere close to "half". There is MAYBE 5-10% overhead in power/area. Most of the non-cache transistors in modern x86 CPUs go towards the out-of-order control logic (re-order buffers, schedulers, highly-ported register files, memory ordering buffers etc...) which attempt to extract instruction level parallelism from the program. High performance CPUs need this logic whether they are RISC or not.
Another note -- Variable length instructions more efficiently encode your program so you don't need as big of an i-cache or as much bandwidth to the i-cache as a RISC processor. It's not all bad. Compile something on x86 and then cross compile it to some RISC processor and tell me how much bigger your binary is...
Instruction sets are not where performance comes from. Circuit technology and underlying microarchitecture are FAR bigger components to performance and how much power your chip burns.
It's funny how Alpha gets such praise for taking a high-power high-heat high-frequency approach (low-IPC) to processor design. Compare the Alphas of their day to the HP PA-RISC offerings which were lower frequency but higher IPC. It reminds me of the Intel vs. AMD battles. There was a famous Microprocessor Report that pitted the speed-demons (Alphas) against the Brainiacs (PA-RISC).
When Intel tries the same Alpha design style, they get buried in the press. I guess it's natural to fight the leader and go for the underdogs - but Alpha was the top dog and never did receive the criticism that Intel does.
It's not like they're going to lose the entire 5 billion. They just haven't been making enough to recoup the costs of their fabs.
. html
"In the first quarter of 2004, IBM Microelectronics lost about $150 million" -- Source http://www.infoworld.com/article/04/04/21/HNibm_1
IBM makes several billions in profit per year. A 150 million per quarter loss isn't going to bury them.
ARM-based processors number about 400 million per year, last estimate I saw. The general number for microprocessors is around 4.7 *billion*. Most of which aren't x86.
Granted.
I should have said something about dollar volume instead of unit volume. Intel makes more money from CPUs than anyone else.
IBM is a niche. Sun is a niche. Alpha, even in it's glory days, was a niche. AMD has 15-20% of the x86 market and is just slightly larger than a niche.
4 0512151634.html). This is not even full production capacity. This all done in 90nm technology -- a full 6 months ahead of anyone else. There were on the order of hundreds of millions of Northwoods sold and they are still selling.
Intel ships 1 million Prescotts a week(http://www.xbitlabs.com/news/cpu/display/200
That's probably more volume in a single week than the entire IBM + Sun + Alpha volume for an entire year.
Why is this the case? It is RIDICULOUSLY expensive to manufacture CPUs in this day and age. If you DON'T ship on the order of 1 million a week, you will never recover the costs necessary to build the all of the fabs.
This is why Sun will eventually abandon SPARC. This is why IBM loses money in their microelectronics division, but will probably maintain POWER and eat the costs for strategic reasons. This is why HP/SGI and others have gone with Itanium.
This is not to discount the technical acheivments of the these CPUs. I design processors for a living and have great respect for the Alpha design team. But at the end of the day, the only reason someone is going to fund the design a computer is to make money. Only the profitable survive.
They are there only to address imperfections in the storage media, not as a part of the fundamental design of the computer. I'd bet CPU designers would love to be able to throw out the caches entirely and address main memory with no middle-men. Cache is to CPUs as connection pools are to databases or as JavaScript is to interactive websites (a means to an end, but not the ideal path).
I understand what you're saying, and I can tell the you probably have a CS and/or Math background. Your comment makes sense in a highly idealized abstract sense, but you can't idealize away things like the laws of physics.
You see, the speed of light can only travel about 3 centimeters in a nanosecond. Latency is fundamentally limited by the speed of light. You can't get better than that unless all of the laws of physics we have discovered are wrong.
There are fundamental physical limits to computer design. I suggest reading "The Physics of Information Technology". Look it up on google and/or amazon etc...
This is the real compairson. Overclock the AMD to 3.6GHz and see who wins. As soon as AMD gets tthe 90nm process perfected I think we will see a huge boost in AMDs clockspeed.
This always annoys me...
You see, you can't buy an AMD at 3.6 GHz because it wasn't designed to run that fast. The AMD does more work per clock so it CAN'T run at 3.6GHz in 90nm. It is simply not designed to do so. The laws of physics prevent this.
The Intel CPU CAN run at 3.6GHz because it was DESIGNED to run at 3.6GHz AT THE COST of doing LESS work per clock.
If I had a CPU that could execute 2 instructions per clock at 1 GHz and another CPU that could execute 1 instruction per clock at 2 GHz, they would have the exact same performance.
They are different design styles. Sometimes the high frequency, lower-IPC approach is better, sometimes the lower frequency, higher-IPC approach is better. You can see this in the discrepency in performance of 2 CPUs with vastly different design tradeoffs.
I'd bet they put it nowhere. L2 and L3 caches are a kludge, and, if they really achieve huge chip-to-chip bandwidth, they just might not need the cache hierarchy. This is reminiscent of old CPUs, where the system RAM ran at an acceptably large fraction of the speed of the CPU, so there was no L2 cache at all.
L2 and L3 caches are not a kludge. They work to reduce average latency of loads and stores. You can solve the bandwidth problem with better interconnects, but on-die caches will still be there for latency reduction.
Speed IS scaling as expected. It's just that we are now living in a power-limited world.
There are so many damn transistors now on the chip that switching them all on and off at the same time draws a tremendous amount of power.
The ability for a system to remove heat from a chip is limited. The costs involved in cooling anything over 150 watts is prohibitive for the volume market.
So, yes if you had infinite money for cooling solutions then you would see speed scale with process.
You know, if we had recruiters for Pharmaceuticals standing outside of colleges offering new graduates 10.2 million over 3 years, then cancer would have been cured 10 years ago. Why do athletes, that contribute NOTHING to society, get paid the most?
Maybe because curing cancer is several orders of magnitude more difficult than hitting 40 home runs in a season.
The financial rewards are there -- multi-billion dollar rewards await the people that cure cancer. These rewards far exceed what any athlete could ever make.
Putting up million dollar rewards to solve problems like the Hilbert mathematics problems haven't yet yeilded any solutions.
Athletes contribute entertainment value to society and are compensated at the rate the market will bear.
Why not type it in non-caps and then highlight the region of text and make it caps later. Much like 'bold' or 'italics' in word processors. Bold/italic/underline are useful too but you don't see people screaming for bold/italic/underline keys.
Caps-lock is just as useless.
One video card company cannot gain more than 50% of today's market. It's just not possible.
Why not? Technology companies routinely evolve into dominant market share players. Intel has an 83% share of all CPUs. Microsoft has a 99% share of operating systems.
There are only 2 players in the video card market now - others have died trying to fight the big 2. If one can sustain a competitive advantage over the other for a few years, one will end up being dominant (ie - have more money than the other guy to invest in newer products to remain competitive).
If you're a programmer, why don't you just use EMACS calc? It's RPN and does all of the binary and hex math.
EMACS is still the ultimate IDE.
The rest of the datacenter might be things like Sun 6500's, 10Ks, or holy shit, a 15K or two. What fills the gap here? I'm starting to see more and more large IBM servers moving in. I guess IBM is really going to capitalize?
It's true that Sun is getting killed by IBM at the high end and Linux/Windows/x86 on the low end. Their cost structure is way too expensive. Their business model is broken. They're getting squeezed into irrelevance.
If Sun wants to survive, they need to start building systems around x86. If I were in charge, I would immediately kill SPARC and start building scalable hardware with Opterons. Sun has some very specific systems and software engineering expertise that they should use to build these enterprise systems and differentiate themselves from the competition.
They should also build a SPARC->x86 dynamic binary translator to ease the migration path to the new ISA.
Another interesting point is that the SPARC64-V was made almost exclusively by native (Japanese) engineers. Fujitsu, as a matter of traditional Japanese corporate policy, does not hire H-1B workers.
I'm calling bullshit on this one. The SPARC64-V architecture was created at HAL corporation which was based in the Bay Area but owned by Fujitsu - Most of the engineers were in fact not Japanese. The chief architect was Mike Shebanow who went to AMD after HAL and then left AMD recently for NVIDIA I believe.
Sun's processor architecture team sucks because they are clueless when it comes to high performance processor design. The UltraSPARC IV is still an in-order machine! It's 2004 for Christ's sake and they can't figure out how to build a competitive out of order processor.
Your H1B comment is a red herring. Intel, AMD and IBM hire tons of H1B workers too, and their processors kick Sun's ass.