AMD Unveils Barcelona Quad-Core Details
mikemuch writes, "At today's Microprocessor Forum, Intel's Ben Sander laid out architecture details of the number-two CPU maker's upcoming quad-core Opterons. The processors will feature sped-up floating-point operations, improvements to IPC, more memory bandwidth, and improved power management. In his analysis on ExtremeTech, Loyd Case considers that the shift isn't as major as Intel's move from NetBurst to Core 2, but AMD claims that its quad core is true quad core, while Intel's is two dual-cores grafted together."
the memory controllers now support full 48-bit hardware addressing, which theoretically allows for 256 terabytes of physical memory.
256 terabytes should be enough for anybody.
"In his analysis on ExtremeTech, Loyd Case considers that the shift isn't as major as Intel's move from NetBurst to Core 2, but AMD claims that its quad core is true quad core, while Intel's is two dual-cores grafted together."
BUUUUUUUUUURNED
Next week: Intel responds by telling us how fat AMD's mother is.
As for the quad-core thing, it's the same story all over again. Intel rush out a solder-together-two-chips job to beat the competition to market, and then the actual innovators come out with something coherent that works more efficiently etc.
I'm not saying the AMD will necessarily be better. What I'm saying is I don't care who gets to market 2 months earlier. I want the better chip, and I can live with the mystery for a few weeks.
Although, frankly, I can barely afford to eat having just built a decent Core2Duo rig, so I won't be investing either way just yet...
Meta will eat itself
Some of us do care. Some for work, some for fun. AMD's "designed as quad-core" approach has some notable consequences, especially in the cache layout that (on paper, of course) seems very well suited to virtualization -- much more so than the Intel solution in TFA.
:)
AMD: a shared L3 feeding core-specific L2 caches. Intel: each core-pair sharing a L2 cache. AMD's approach better avoids threads competing for the same data (thanks to copying it from L3 to every L2 that needs it), while keeping access latencies more uniform and predictable (thus better optimizable).
Other AMD enhancements look more like catch-up to Core 2: SSE [and it's "Extensions", dammit, not "Enhancements"] paths from 64bit to 128bit, more advanced memory handling (out-of-order loads versus Intel's disambiguation et al.), more instructions per clock by beefier decoding (more x86 ops through fast path instead of microcode) and more "free" ops (where Intel added way more discrete execution units from Core to Core 2).
If AMD's quad manages to be better due to better memory bandwidth and latency (in practice), then they were quite right about "true quad-core"
As the person who responded to your last post explained, that's just not possible with the K8 architecture as it is. The memory controller is on-die and memory technology is evolving, therefore the interface between the processor (where the controller is) and motherboard (where the DIMMs are) must also change.
The closest to a solution we have would be going back to Pentium 2/3 style processor-on-a-card designs which would move the memory slots to an expansion card shared with the processor which would then have a HyperTransport interface to the motherboard.
This works, as some motherboard manufacturers (ASRock on the 939DUAL for one) have implemented something along these lines for AM2 expandability. The problem lies in laying out the circuitry for this new slot, not to mention the incompatibility with many of the large coolers we often use today. It also would become even more complex when faced with another one or two extra HyperTransport lanes as found on Opteron 2xx and 8xx chips, respectively.
AMD made a compromise when they designed K8. On the one hand, the on-die memory controller improves latency by a huge amount and scales much better by completely eliminating the memory and FSB bottlenecks that Intel chips get in a multiprocessor environment. On the other hand, new memory interface = new socket, no way around it.
From what I understand, the upcoming Socket F Opterons will have over 1200 pins in their socket so as to allow both a direct DDR2 interface and FB-DIMM. If I understand FB-DIMM technology correctly, it should end this issue by providing a standard interface to the DIMM which is then translated for whatever type of memory is in use. Logically this will trickle down to the consumers in another generation. For the time being however, AMD has stated that the upcoming "AM3" processors will still work in AM2 motherboards, as they will have both DDR2 and DDR3 controllers.
I used to get high on life, but I developed a tolerance. Now I need something stronger.
Intel's QC is really an MCM, or multi-chip-module. That means they have literally grabbed two Conroe (Core 2 Duo) chips off of the assembly line, and mounted them in a single package. From the outside it looks like a single chip, but inside, it has two, separate peices of Si, connected over the FSB. That is the problem: the two chips are connected to the same bus. A single chip presents one electrical load on the bus, and two chips present two loads. This means that the speed of the bus needs to be dropped. That is why kentsfield will have a slower bus speed than normal chips. If you think about it, this is the exact opposite of the situation you want. You have just added a core, so it would be nice to add more bus bandwidth. Instead, the Intel solution lowers the overall bus bandwidth, not to mention that it is a shared bus. The two cores fight each other over a very slow external bus, and this creates a performance bottleneck.
When all four cores are on a single peice of Si, all sharing a L3 cache, the chips don't need to fight over the external bus as much. The cores can share information between them internally, and do not need to touch the slow external bus to perform cache coherency and other synchronization. Also, true QC chip presents one load to the outside bus. This means that the bus speed does not need to drop because of electrical load.
There are many people who don't care how the cores are connected as long as the package works. The point is that the way the cores are connected have a direct impact on performance. We'll be talking about Intel vs. AMD cache hierarchy in 2007 when AMD uses dedicated L2 and shared L3 while Intel uses only shared L2. Expect cache thrashing on Intel's true QC chips with heavily threaded loads when it comes out. Next I'll hear people say that the cahce doesn't matter as long as it works. As long as it works for what? Single-threaded tiny-footprint benchmarks like SuperPi or Prime95? How about a fully threaded and loaded database or any other app that will actually stress more than the execution units?
I am a viral sig. Please help me spread.