AMD's Showcases Quad-Core Barcelona CPU
Gr8Apes writes "AMD has showcased their new 65nm Barcelona quad-core CPU. It is labeled a quad-core Opteron, but according to Infoworld's Tom Yeager, is really a redefinition of x86. Each core has a new vector math processing unit (SSE128), separate integer and floating point schedulers, and new nested paging tables (to vastly improve hardware virtualization). According to AMD, the new vector math units alone should improve floating point operation by 80%. Some analysts are skeptical, waiting for benchmarks. Will AMD dethrone Intel again? Only time will tell."
I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail. I care about the results - how fast is it for my workloads? How much is it? How much power does it use?
Obsession about process size is sillier than obsession over clock speeds.
If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
Advanced users are users too!
SSE+ operations up until now were operated on 64 bit at a time within the processor. SSE128 just means the new AMD chip will complete a SSE instruction in one pass.
This was pretty much the reason why most people only bothered with MMX optimizations in their applications.
When Intel first added SSE to the Pentium 3 chips, they did it with a 64bit setup to save die size on the then 350nm parts. Even when they moved to the newer smaller designs, they left it that way. The Core2 was the first chip to incorporate a single issue SSE engine. Therefore, with the Core2, it loads the instruction, then executes it. With the other chips, you have to load the first part(if it's a full 128bit instruction, or if it's multiple instructions added together), save, load, save, add, execute. This is where the Core2 kicks butt. I've been saying that the Barcelona would move to that design, since it's the biggest reason Intel has been beating AMD in the benchmarks. This will re-level the playing field. There have been lots of articles about this. Google it
As long as AMD and Intel continue to chase each other in the x86 market, high end chips become low end in the span of six months. Just keep buying 6 months behind the press releases and you get great processors for next to nothing.
Weaselmancer
rediculous.
In my own benchmarks (generic C integer and floating point scientific code) I have found that the Core Duo and Core 2 Duo aren't all that quick compared with an AMD64. Clock for clock the AMD64 Opterons we have are about 50% quicker than an equivalent Core 2 Duo for integer work. I know this doesn't agree with all the usual magazine benchmarks but they are heavily biased towards using SSE instructions where possible and it is SSE where the Core 2 Duo has been a real improvement over previous Intel designs and also bests the AMD chips. Hopefully, AMD has recognised this and the new SSE implementation will bring them back on par with Intel for these benchmarks but even today an AMD64 processor is a beast and more than a match for anything Intel produces.
"I have the attention span of a strobe lit goldfish, please get to the point quickly!"
Three quad cores for the pasty-nerds under the sky,
Seven for the WoW-nerds in their halls of stone,
Nine for Diablo Men doomed to die,
One for the Dark Nerd on his dark throne
In the Land of Silicon where the corporations lie.
One quad core to rule them all, One quad core to find them,
One quad core to bring them all and in the darkness bind them
In the Land of Silicon where the corporations lie.
He paused, and then said in a deep voice,
This is the Master-quad core, the One quad core to rule them all.
Core2 has single-cycle throughput on most SSE instructions, not single-cycle latency. Most of these instructions still take 3-5 cycles to generate results, which is similar to the Pentium M, but now a vector of results finishes every cycle, instead of every two or four cycles.
An important consequence of this is that if your instructions are poorly scheduled by the compiler (or assembly programmer) and the processor spends too much time waiting for results of previous operations, the advantages of single-cycle throughput mostly disappear.