AMD Announces Triple-Core Phenom Processors
MojoKid writes "AMD has officially announced their triple-core Phenom multi-core processor offering, suggesting a triple-threat of processors, from dual-cores to triple-cores and native quad-cores coming to market this year. While the term symmetric multi-processing (or SMP) suggests a balanced approach of multiple cores in an even number of engines working together on a single workload, AMD offers that an odd number of processors can slice at that workload just as efficiently. Time will tell how this architecture will scale amongst various multi-threaded applications and real-world usage models. AMD is definitely moving to make use of these quad-cores that don't quite make the cut by testing them fully as triple-cores and realizing some revenue, rather than throwing them away."
Why Yes. Yes it does. From HERE: Inside, the Xbox 360 uses the triple-core IBM designed Xenon as its CPU. While graphics processing is handled by the ATI Xenos which has 10 MB of embedded eDRAM, its main memory pool is 512 MB in size.
There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
SMP doesn't suggest the number of cores should be a power of two, it doesn't even suggest "even number of cores".
It's about multiple cores processing simultaneously. Check the article I link to, even the damn example diagram has 3 cpu-s.
And why should ``symmetric'' imply even? It merely implies that all cores see memory with the same class of service. And, in reality, aren't most AMD multiprocessors cc-NUMA machines, not SMP?
For most workloads, if they are fairly multithreadable, 3 processors available will be just fine. I know of very few workloads that require an even number of processors, and even if it were the case that the task were split into an even number of threads, the OS should have no problem scheduling on a reduced number of processors.
Hey, doesn't the XBox 360 have a 3-core PPC in it?
-- Erich
Slashdot reader since 1997
Symmetric just means the processors are equivalent (they all do the same generic tasks)... As opposed to an asymmetric system where different processors are assigned different roles (one does interrupts, one does graphics, one does IO, etc)...
SMP refers to the fact that all the processors are identical and share the same memory (in contrast to NUMA designs like multi-chip Opteron systems). However, I've seen more and more people refering to cache coherent NUMA designs like multi-core opteron and the upcoming CSI based intel systems as SMP systems which, while a stretch of the definition, is at least reasonable.
Suggesting that SMP has anything to do with having an even number of processors is just DUMB. It may be the case that SMP systems usually have an even number of cores (I don't know) but that's not what the writeup or article seem to be saying.
If you liked this thought maybe you would find my blog nice too:
I always thought that too, but the Xbox 360 has a 3 core CPU as well.
Supposedly 3 core is actually pretty nice in some ways, as each core has a direct link to the other two. On a quad core system, each core is linked to two others, so sometimes it takes two hops to get messages from one core to other, slowing things down.
Despite both the summary and the article, it's a real 3-core chip, designed that way from the ground up, so I presume that the data paths are the same length. IIRC, somebody designed and sells a three socket mobo where all the data paths are also equal. (Ah, here it is: http://hardware.slashdot.org/hardware/07/08/13/1749213.shtml, a three socket Opteron machine with two PCIe slots and two Infiniband 4x ports.) I'd like to see a version for the Phenom 3-core CPUs; even better would be building some sort of Beowulf cluster using three of them, each using a pair of cross-over cables for the interconnects. That would give you one sweet 27-way cluster.
Nothing for 6-digit uids?
Latency probably is the issue. Remember the Pentium 4 - it had a pipeline of over 20 stages, with a some of those stages being there simply to allow time for the signals to make it from one side of the chip to the other.
It was first used for the early 90's Acura Vigor/Honda Accord. I wanna say 93 but probably 92 or 91 knowing my awesome memory. Beyond that, they also used it for a couple years in the Acura TL in the late 90's.
;-)
Next question please...
Perl - $Just @when->$you ${thought} s/yn/tax/ &couldn\'t %get $worse;
Visual IRC: Fast. Powerful. Free.
I know of at least one 16-core commercial processor. Oh, it runs Linux too.
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
AMD's multi-core processors use a fully-connected crossbar switch in the on-die northbridge to communicate. There is only one "hop" between each core.
What you're thinking of is a four-socket system whose interconnect network is not fully-connected - it's only the edges of a square, and there are two missing links between the "corners" of the square. That is certainly a legitimate topology for a four-socket system, with the limitation you pointed out (two hops to get to the opposite node), but it doesn't apply to AMD's quad-core die.
The Barcelona/Phenom architecture allows each core (plus the northbridge) to run on its own power plane, and for cores to be turned off completely. Of course, core 0 is the bootstrap processor, so that core has to always be enabled, or they have to have a way to change which one is core 0 before it leaves the factory. Otherwise the BIOS won't be able to bring the other cores online.
The idea of post-factory error detection isn't so far-fetched. If a chip passes QA, the sorts of defects you'll see later in its life are likely to be thermally induced, and the likelihood that the defect will manifest prior to loading of the BIOS is very low. You're not using the MMU or the FPU at all, you're not using much of the cache, you can be running at your minimum power setting, and you're not doing it long enough to heat up much. If a core gets marked bad due to an excess of MCEs, similar to how many systems can mark DIMMs bad on excessive multi-bit ECC errors, the BIOS simply doesn't need to bring it online at boot time. Even if core 0 is the faulty one, you can probably load just enough of the BIOS to bring a good core online and finish booting, since you're not straining it enough to cause thermal problems, and you're only using a tiny fraction of the instruction set and die transistors. This sort of High Availability feature probably won't make it to the desktop right away, but as core counts keep increasing, it's inevitable.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Firstly, for any general multi-node graph, it's entirely possible for three, four, eight, or any number of nodes to be only one hop away from each other. See fully-connected mesh. For the four-node case, imagine a 2D square, connected on the four sides, plus two links connecting the "diagonals" of the square. In that topology, each of the four nodes are only one hop away from each other. Of course, as the number of nodes increases, the cost of fully connecting them increases, as does the processing cost to multiplex and process transactions into the node from the (n-1) incoming links, but with only four nodes it's entirely possible to create a fully-connected network.
Wiith AMD multi-core processors, all of the cores communicate using a fully-connected crossbar switch in the on-die northbridge - meaning all cores on the die are one "hop" away from each other, including the four-core case. What you're probably thinking of is a multi-socket system that only has two coherent links per socket - that would prevent you from making a fully-connected coherent interconnect for a 4-socket system.
YES!!!!!!
Mod parent up, please, and while you're doing that, read this:
http://www.theonion.com/content/node/33930
It was an old SNL skit.
We're talking about CPUs & AFAIK, the the traces can't cross one another..
Unless they commercialized some 3D process @ 65nm that I didn't read about. Wiith AMD multi-core processors, all of the cores communicate using a fully-connected crossbar switch in the on-die northbridge - meaning all cores on the die are one "hop" away from each other, including the four-core case. Sooo...
Cpu 1 --> hop --> northbridge --> hop --> CPU 4
Or am I misunderstanding the definition of a "hop"?
[Fuck Beta]
o0t!
You can do 4 objects and connect them all without oven using another layer. Picture a triangle with the other component in the middle. Connect every vertex to the middle. Make the traces to the middle zigzag a bit to even out the trace lengths, and boom, fully connected without any intersections. Not saying this is how things are done, mind you, but it is a silly argument to say three cores are good because they can be connected trivially. 3-core cpus are all about yield. Being able to sell components that had a flaw in a core, without reverting all the way down to a two core part (and by extension the two core price point), is important.
All that said, SMP has nothing to do with an even number of processors/cores. It just means each processing element of a system is roughly equivalent. So you have a choice of three parts to schedule something on, the scheduler can know all three are equally capable and the heuristics for processor selection are straightforward. ASMP typically has specific roles for each part (i.e. a dedicated processor for interrupts, etc etc)
XML is like violence. If it doesn't solve the problem, use more.
No, the thing that's awesome is that eighteen months after the Onion article was published, Gillette actually did it.
Another reason why powers of two are popular with multicore chips is that powers of two can be laid out into rectangles. If your multicore design is basically a copy-and-paste job with a little glue logic, it's a lot easier to lay out the cores. With something like the Cell, 8 is a nice number of cores since it allows you to have two rows of four. Three is just awkward.
The Cells found in the Playstation 3, however, did not have 8 SPU cores, they had 7. This is because most of the die space is the SPUs and you can dramatically increase yields if you only expect 7 of the 8 to work. If a single SPU has a manufacturing flaw, you just disable that one and sell pop the chip in a PS3. If none of them do, you sell it for more expensive blades.
AMD and Intel have been doing this for a while. Chips with flaws in the cache have some of the cache disabled and are sold more cheaply. In addition AMD chips are designed with three hypertransport controllers. If only one works, the chips are sold as Athlon 64s. If two work, they are cheap Opterons, if all three work, they are expensive Opterons (exactly how expensive depends on how many flaws there are in the cache area). Similarly, with the dual core lines flaws in one core result in them being marked down as single-core chips.
Intel, currently, sell quad core chips containing two separate dies. If either die has a flaw, it is sold as a Core Solo and not put in a dual-die package. AMD, however, are going to be making single-die quad-core chips. Selling three-core versions allows them to make use of the ones with a flaw in one core. This should help keep their yields high (and thus their costs relatively low), since it means that they can sell flawed chips almost irrespective of where the flaw is, just marking it down as a cheaper part.
I am TheRaven on Soylent News
There are around 10 metal layers in a modern IC. Traces can certainly cross one another on different layers.
Cpu 1 --> hop --> northbridge --> hop --> CPU 4 Or am I misunderstanding the definition of a "hop"?I see what you're saying...but you're counting a processor interface as a hop. How many hops are there between cores in an Intel system then? CPU 1 --> hop --> front side bus --> hop --> CPU 4? No one counts hops that way. Your definition of hop would be like me saying there are two hops from your computer to your router, one between your processor and your network card over the PCI bus, and one between the network card and the router over ethernet.
Before you call me incorrect, please take 2 minutes to look at some lecture notes from an intro VLSI course:
http://www.cse.sc.edu/~jimdavis/Courses/2005-Fall%20CSCE%20613/CSCE613-Week10-Chapter-04-05.pdf
You can clearly see on page 3 (slide 6) that metal1 and metal3 are directly on top of each other. As I stated in a different post, you're confusing metal layer/wire routing in an IC with entire logic devices (transistors/gates/flops). Let me repeat it again for you: metal layers in an IC can cross.