AMD Going Dual-Core In 2005

A lot more info over at anandtech... by MarkWPiper · 2004-06-14 11:21 · Score: 4, Informative

for additional AMD dual core story links by ruiner5000 · 2004-06-14 11:23 · Score: 5, Informative

you can find them all here. It seems news has gotten around, and that AMD's dual core will consume just about as much power as a single core CPU at 90nm.

--
ignorance is bliss. googlefiberatx.com

Re:Why not quad core? by mp3LM · 2004-06-14 11:25 · Score: 5, Informative

heat

Yes..the evil of all machines
the reason why when the AC is not on in my house, and it is 90degrees outside, my computer resets
and of course..the reason why we're not going quad core

well..at least that's my personal opinion...as for the real reason...probally for profit...

Re:End of moores law? by DAldredge · 2004-06-14 11:26 · Score: 5, Informative

Moore's Law has NOTHING to do with CPU speed.

from a google search.

Moore's Law /morz law/ prov. The observation that the logic density of silicon integrated circuits has closely followed the curve (bits per square inch) = 2^(t - 1962) where t is time in years; that is, the amount of information storable on a given amount of silicon has roughly doubled every year since the technology was invented. This relation, first uttered in 1964 by semiconductor engineer Gordon Moore (who co-founded Intel four years later) held until the late 1970s, at which point the doubling period slowed to 18 months. The doubling period remained at that value through time of writing (late 1999). Moore's Law is apparently self-fulfilling. The implication is that somebody, somewhere is going to be able to build a better chip than you if you rest on your laurels, so you'd better start pushing hard on the problem. See also

You'll need a new motherboard. by filledwithloathing · 2004-06-14 11:30 · Score: 4, Informative

Will this be socket 939 or should I try to hold out another year to buy?"

You'll need a new motherboard.

The DDR memory interface appears to wrap around both L2 caches, meaning that it looks like both cores have their own 128-bit memory interface; whether or not both memory controllers will be enabled is another thing, but if this is true we have a number of implications to talk about. If dual core Opterons do indeed have two memory controllers, the pincount of dual core Opterons will go up significantly - it will also make them incompatible with current sockets. AMD is all about maintaining socket compatibility so it is quite possible that they could only leave half of the memory controllers enabled, in order to offer Socket-940 dual core Opterons. AMD isn't being very specific in terms of implementation details, but these are just some of the options.

--
Are you a VF grad? Check out the VFMA Alumni Forums VFMA Alumni Forum

Re:You'll need a new motherboard. by ruiner5000 · 2004-06-14 12:13 · Score: 2, Informative

No you won't. Infoworld got it right. Anand should have researched before he put up his story.

AMD's dual-core server processors will share a single memory controller, Weber said. This won't create a bottleneck because a server with two Opteron chips, and therefore two memory controllers, already has more than enough memory bandwidth required to run that system, he said.

"It's always a juggling act to add a little more processing and a little more memory. Right now, we have plenty of memory and I/O bandwidth, so we're adding processing," Weber said.

The dual-core chips will work with current socket technology in motherboards that are rated for the specifications of the dual-core chips, Weber said. A BIOS change will be required, but otherwise the chips will work in the same sockets as single-core Opterons, he said.

--
ignorance is bliss. googlefiberatx.com

In answer to poster's socket question: by MarkWPiper · 2004-06-14 11:31 · Score: 4, Informative

From the article. "If dual core Opterons do indeed have two memory controllers, the pincount of dual core Opterons will go up significantly - it will also make them incompatible with current sockets. AMD is all about maintaining socket compatibility so it is quite possible that they could only leave half of the memory controllers enabled, in order to offer Socket-940 dual core Opterons. AMD isn't being very specific in terms of implementation details, but these are just some of the options."

Re:In answer to poster's socket question: by MBCook · 2004-06-14 12:49 · Score: 3, Informative

As I posted elsewhere in this thread (link), that speculation seems to be wrong, which is good. Source for that info is here.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

Re:Just get... by cyfer2000 · 2004-06-14 11:35 · Score: 3, Informative

garage band of iLife

--
There is a spark in every single flame bait point.

Re:Just get... by shundi · 2004-06-14 11:36 · Score: 2, Informative

So is this a figmant of my imagination?

Re:Really nice alternative to dual processor syste by rebelcool · 2004-06-14 11:41 · Score: 4, Informative

Anything multithreaded. Which is just about any modern GUI app.

--

-

Re:Just get... by ps_inkling · 2004-06-14 11:49 · Score: 4, Informative

Must... not... feed... trolls....

Diablo II, Starcraft, Warcraft
Unreal Tournament 2004, Neverwinter Night, Dungeon Siege, Civ III
Myst, Riven, Exile
Medal of Honor and expansions, Battlefield 1942, Ghost Recon
Ghost Master
Quake III, Beyond Castle Wolfenstein
Escape Velocity Series, among others

There are plenty of other games for the Mac platform as well, check the Apple website for a larger list.

AMD was first, nitwit. by Anonymous Coward · 2004-06-14 11:50 · Score: 1, Informative

AMD was the first to announce dual core. Intel had to re-adjust their roadmap to pull dual core in from 2006 to 2005.

Re:Why not quad core? by ruiner5000 · 2004-06-14 11:57 · Score: 5, Informative

actually there is plenty of bandwidth left in hypertransport to pull it off. also each cpu gets its own bank of memory. the design is superior to all others for SMP. even AMD's man CPU man says that at infoworld

AMD's dual-core server processors will share a single memory controller, Weber said. This won't create a bottleneck because a server with two Opteron chips, and therefore two memory controllers, already has more than enough memory bandwidth required to run that system, he said.

"It's always a juggling act to add a little more processing and a little more memory. Right now, we have plenty of memory and I/O bandwidth, so we're adding processing," Weber said.

The dual-core chips will work with current socket technology in motherboards that are rated for the specifications of the dual-core chips, Weber said. A BIOS change will be required, but otherwise the chips will work in the same sockets as single-core Opterons, he said.

--
ignorance is bliss. googlefiberatx.com

Re:Demise of processors predicted! by ruiner5000 · 2004-06-14 11:59 · Score: 2, Informative

funny, except that AMD's dual core is pin compatible with current motherboards. When you research your jokes punchline it becomes funnier.

--
ignorance is bliss. googlefiberatx.com

You response was half right and half assed by AHumbleOpinion · 2004-06-14 12:00 · Score: 4, Informative

doesn't require 5 loud fans in the case to keep it cool enough

While I understand the desire to build your own and preferring not to be vendor locked, you G5 fan comments are quite ignorant. The Apple G5's are well designed and exceptionally well layed out to create thermal zones serviced by different variable speed vans. It is a very quiet solution. Do not confuse the G5 with some of the homebuilt Athlon abominations that have poor layout, poor airflow, and require multiple screaming fans. YMMV.

Re:You response was half right and half assed by ameoba · 2004-06-14 13:05 · Score: 2, Informative

If you've been following things lately, Intel's P4 offerings have been cranking out far more heat than anything AMD's got comming of the line.

--
my sig's at the bottom of the page.

Re:Just get... by dennism · 2004-06-14 12:10 · Score: 3, Informative

Well, first off, I'm pretty sure that the G5 could be cooled via only convential fans similar to the P4 and Athlons. But, Apple has pretty much made it their mission to reduce fan noise on their machines.

Second -- actually, we don't know that we'll be able to swap out single core Opterons with dual core Opterons. They're not out yet. The G5 is. If later on it proves to be true, then you can say that you can swap them out.

Third -- the G5 gives you access to one of the better Operating Systems around, MacOS X. That has to give it a few advantage points.

BTW -- I happen to have both a Dell Dimension 8600 and a dual 1.8ghz G5 in my office at work. When the Dell is running, you notice it. It's quieter than the thrown together PC that's also in the office, but still loud enough to notice. On the other hand, the G5 is completely quiet. I never hear the fans in there at all. I can actually see one of the fans moving from the front, but it's moving at such a slow speed that you can't hear it at all. For some of us, that is a feature.

--
dennis

Re:Just get... by Exitthree · 2004-06-14 12:11 · Score: 2, Informative

Say what you want about the merits of building your own box, but don't call the G5 noisy. It has multiple low-speed fans to keep it quiet. It has separate thermal zones with independent cooling systems to minimize noise. I have heard, or rather been near enough a G5 to know it is not a loud computer.

Re:End of moores law? by soundsop · 2004-06-14 12:31 · Score: 2, Informative

Is this the end of moores law, at least in the form of CPU speeds doubling every 18 months? There are essentially two CPUs, I doubt each of them will get 2x faster the next 1.5 years :)

There have been quite a few posts pointing out that Moore's law actually refers to exponential growth in transistor density rather than speed.

The posters are technically correct, but the term Moore's law has come to encompass any processor-related metric that changes at an exponential pace, including processor performance, clock rate, and power consumption. Of course, these metrics are directly related to transistor size and density, so it makes sense that they have changed exponentially.

For those with access to IEEE articles, Gordon Moore (Intel founder, who Moore's law is named after) wrote an interesting paper called No exponential is forever: but "Forever" can be delayed!.

Re:Socket 939!? by CaptainPinko · 2004-06-14 12:35 · Score: 2, Informative

Nope. Regular opeterons use 940 pins. They took one pin off going from registered to unregistered RAM. No joke.

--
Your CPU is not doing anything else, at least do something.

Re:why go for CMP and skip SMT by WeekendKruzr · 2004-06-14 12:35 · Score: 5, Informative

SMT is only needed if your execution units are having trouble remaining filled up, which was the problem with the NetBurst architecture due to the huge hits that it takes with a branch mis-prediction penalty. When a mis-predict happens the execution unit has to sit idling away and wait for the proper info to go be re-fetched. With SMT, the unit simply switches over to one of the other threads waiting in the wings which keeps the processor doing useful work instead of wasting cycles. This is why the software has to be re-written to take advantage of it so that the processor knows which threads to give priority to.

Intel stuck SMT into the Pentium in order to balance out the some of the negative effects the go hand-in-hand with a processor that has a LONG pipeline. AMD has a much shorter pipeline (especially when compared to the new Prescott) and therefore they don't suffer much of a penalty when a mis-predict happens. Also, if I remember correctly the Athlon was already known being extremely efficient in terms of resource allocation within the processor since AMD can't afford to just dump tons of extra cache onto the chip.

Both of these things taken together means that using up extra real estate on the die of the Athlon in order to get SMT isn't really worth it in terms of the performance it would bring. Even on the Pentium the benefits aren't all that hot and it's only in specific types of code that you see any impresive speed gains.

Re:Why not quad core? by hxnwix · 2004-06-14 12:35 · Score: 5, Informative

The opteron (k8) has an integrated memory controller and up to three hypertransport links. In a dual k8 system, the cpus communicate over a single hypertransport link and are usually paired with their own memory bank. If one cpu needs data from the other's bank, it comes over the hypertransport link. Some cheap dual opteron boards save traces by pairing one cpu with all the memory banks - so every memory operation on the non directly linked cpu passes over the h-link.

The dual core cpu might have the pins for two seperate memory bank arrays or just the pins for one. Either way, the situation as far as dual k8s go is not really different from what we have already. Either way, it's a few steps above the p4 design: shared cpu bus to northbridge to memory. (yech! with a single proc, this introduces latency, with multiproc, you get contention and latency at every level)

AMD's cpu interconnect is so well thought out... it gives me the warm fuzzies pondering it:

A uniproc hammer needs one h-link for io.
A dually needs two per core: 1 for core to core, 1 for io (though all the io on all the boards I have seen feeds to only one proc's h-link... so that you don't lose PCI busses and such if you have only one proc installed, I suppose).
Quad and above requires three: each core links to two other cores, leaving one h-link per core for io. One could have a pci-e bus per proc, if one desired. But again, I haven't seen a design that doesn't feed all io into a single h-link.

Since no one uses the extra h-link anyway, a dual core package for a dual core system would need only one external h-link (saving some cash).

A quad core, dual package system would require three h-links feeding out of each package, though. But even then, the number of h-links laid out on the mobo is reduced and the whole shebang should be cheaper.

Intel's "one huge shared bus" + northbridge design is definitely being trampled...

Re:Just get... by beakburke · 2004-06-14 12:47 · Score: 2, Informative

Umm let's see, I'm sure that the Unreal games are. Actually the game situation on the Mac is much better now than in years past. Most of the more popular games do work on the Mac. But not nearly as many as on windows, but that's to be expected I guess. It's just fine if you are only an occasional gamer and aren't super picky about your games.

--
----- Question authority, but not ours. Hate the man, but we're not him.

Re:Why not quad core? by NerveGas · 2004-06-14 12:50 · Score: 2, Informative

Because the overall size of the die is a tremendous factor in the cost of a processer. Because of that, die sizes tend to stay relatively constant over the years.

As manufacturers are able to squeeze the transistors in more tightly, then you see more circuitry appearing. As they move to 90-nanometer production, they're going to be able to pack on more transistors, and using dual cores becomes an economic possibility. However, throwing FOUR cores on would make the die large enough to be an economic disaster. (Die size was one of the largest problems with the Pentium Pro.)

steve

--
Oh, you're not stuck, you're just unable to let go of the onion rings.

Re:Why not quad core? by Paul+Jakma · 2004-06-14 13:37 · Score: 4, Informative

apparently because of reduced bus conflicts with their individual memory spaces.

Ah but with multi-core chips they can transduce their flux capacitors with the onboard trans-mogrification controllers. Seriously "reduced bus conflicts with their memory space", what does that mean?? That's gibberish.

P4, presumably, like the P6 GTL+ host bus is a shared bus (like most buses are). Only CPU can use the bus at any one time. If the bus does x GB/s, that's only to one CPU at any given time - effectively it is shared. Further, P6 and P4 do not have integrated memory controllers, and must access RAM via the (shared) GTL+ bus, if it is not in cache. Eg, a 4 CPU machine looks like:

P = CPU MC = Memory Controller (part of the "northbridge" chip, also provides PCI host bus controller, etc.) P P P P | | | | --------- GTL+ bus | MC--RAM

Also GTL+ is limited to 4 CPUs and one controller. To get 8 CPUs some controller vendors have invented a GTL+ 'bridge' to stitch 2 GTL+ buses together, but that just makes things worse really from a scaleability POV I'd imagine.

The K8 on the other hand uses a point-to-point (PtP) serialish, packet based transport, HyperTransport to interconnect CPUs and has onboard memory controller(s) (connected internally via HyperTransport links). A 4 CPU K8 machine looks like:

K = K8 CPU HT = HyperTransport link RAM--MC-\ /-MC--RAM RAM--MC--K--K--MC--RAM | | | | RAM--MC--K--K--MC--RAM RAM--MC-/ \--MC--RAM

Each of the lines out of a K is a HyperTransport link. Each MC is integrated into the die itself. (you'll have to imagine interconnects and right-hand top/bottom MC's lining up with the K symbols, cause /.'s filter is chomping whitespace in some strange way on me).

Each CPU has 4 HT links, two to other CPUs, two to its (integrated on die) memory controller. For dual CPU setups, each CPU needs only link to another CPU obviously. Indeed the difference between 2xx, 4xx and 8xx AMD Opteron CPUs is the number of HyperTransport links. Indeed in large multi-CPU (ie 8+) SMP setups one need not attach a memory controller to each CPU, one might choose to have a central "cross-bar" of fully-meshed K8s who then connect to peripheral K8s which have memory controllers and hence RAM. Tis all down to the board designers I guess. And a bit of a fun computer science problem too in terms of designing optimal 'networks' of interconnected nodes with the best compromise of maximum node to node distance for lowest number of required interconnects.

The K8 is actually a ccNUMA (cache coherent, Non-Uniform Memory Architecture) machine, in SMP configurations. Ie, different memory is at different distances to different CPUs, or to put it another way, some memory is local, other memory is distant, some memory may be more distant than other memory. Eg, for the top-left CPU to access RAM on it's "local" MC is obviously potentially far quicker, in terms of latency, than to access "distant" RAM on another node, and to access memory on an adjacent K8's memory controller will have lower latency than to access memory allocated in the bottom-right CPUs RAM. A good OS aware of the issues can try ensure to keep processes on the CPUs to which that processes memory is "local" and hence maximise performance, but it's quite a juggling act (Linux has some NUMA support).

What AMD will do for multi-core we dont know. For certain the individual cores will be connected by HyperTransport. Most likely AMD will give each core their own dedicated memory controller, which would simply make a multi-core SMP be exact same in terms of architecture as the current dual K8 architecture (ie 2xx opteron), and hence no different in terms of bandwidth contention than for existing SMP Opterons.

It will make large SMP machines a lot easier to build though. Eg

--
I use Friend/Foe + mod-point modifiers as a karma/reputation system.

Re:Why not quad core? by Paul+Jakma · 2004-06-14 13:55 · Score: 2, Informative

Self-correction: Apparently it might be just _one_ memory controller per die, which may or may not itself be dual channel (I gather from other posts). Also, obviously each CPU potentially has additional HT links to connect to things like PCI bus controllers, AGP controllers, etc. (the basic block diagramme for the Tyan S2885 dual K8 board shows the AMD-8151 AGP controller and the AMD-8131 PCI-X controller wired to CPU0).

--
I use Friend/Foe + mod-point modifiers as a karma/reputation system.

Re:Exactly my first reaction! by PaulBu · 2004-06-14 14:50 · Score: 4, Informative

Yeah, also called PIM, for "processing in memory". Peter Kogge (of Kogge-Stone fast adder fame) is doing research into those in Notre Dame (disclaimer: I used to be affiliated with the same HTMT crowd designing ultra-fast superconductor processing elements for the petaflop computer, thus I know first hand how hard it is to match memory speed to pipelined processing speed... ;-) ).

Bringing processors and (large) memories closer to each other does not help much, as, as you mentioned, there is an order of magnitude difference between processor clock speed and memory access speed. The physical reason for this is that to do a certain operation on one pipeline stage in a processor you need to charge a clock line passing through a couple dozen to couple hundred gates; in memory case you have to charge the word line passing through sqrt(1G)=30,000 gates. It takes time (RC, unless one uses superconductors and forgets about R ;-) ) and power (CV^2/2).

The only rerasonable solution is, indeed, to make memory blocks smaller and closer to processor elements, making them essentially registers/caches, not RAM.

Oh, and, BTW, in the rather naive picture on the link you sent, the solution will not work that well if you have multiple processors -- you have to make sure that each can talk to other's memories (in SMP case) AND to each other.

Paul B.

Re:Why not 8 x i486 cores? by barawn · 2004-06-14 16:20 · Score: 4, Informative

Did you miss the part about shrinking it down to modern geomerty, meaning it would run faster on less power (read less heat) than the original? Sure a 90nm i486 isn't going to run at 3.6GHz like a P4, however I expect it would run a good amount faster than a 486DX2-66 once did.

Unfortunately, nothing will beat the architectural gains which have advanced since the 486 era, and the "worst case" pipeline waits will keep your clockspeed at an insanely low level.

Let me try to explain. The 486 had a 5 stage pipeline - fetch, decode, dispatch, execute, and writeback. Now, each of those pipeline stages isn't going to take the same "minimum" amount of time - some of them are fixed by things other than switching latencies. So, say your execute stage is fine taking only 1 clock cycle up to, say, 2 GHz (a minimum latency of 500 ps), but your decode stage, simply from physical concerns, is going to take at least 5 ns to complete. This means that the maximum you can ramp the clock speed up to is 200 MHz, because each stage in the pipeline has to take 1 clock cycle, so if 5 ns is your minimum, you'll have a max clock speed of 200 MHz.

The solution, though, is obvious - break that "5 ns" decode step into multiple pipeline steps - say, 5 of them, each taking 1 ns each. Now your maximum clock frequency is 1 GHz. The problem is that your pipeline is now 9 stages long, and you have a new architecture - which is precisely what Intel did several times over to allow the clock speed to ramp.

And that's just the pipelining limitation. There are other architectural problems with "ancient cores" as well. One basic problem is that the x87 floating-point architecture is crap. It's stack-based, which means you can only do math with the "stack head". So in order to store things in the registers, you need to use the FXCH instruction to switch the stack head and one of the registers. Well, modern CPUs (the P3 and the Athlon) got around this by saying "we'll make FXCH be a zero-cycle execute when paired with an arithmetic instruction (and after the Pentium, screw it, they're free totally)". Since the modern CPUs can decode more than one instruction per cycle (3 for an Athlon), and the FXCH instruction only lives up to the decode stage, you're really not hurt, as the FXCH fills a pipeline stage that probably would've been left empty anyway. Now consider the P4, which was designed to try to encourage people to move away from x87: it does not have a zero-cycle FXCH, and its x87 performance is abysmal. (The 486 does not have a pipelined FPU, nor a free FXCH instruction. It would be even worse.)

And I haven't even mentioned register renaming yet, which works around the register limitations of the x86 ISA by creating registers that the software doesn't know about, but which the hardware can "cheat" and recognize certain compiler patterns which work around the register limitation.

In short - many core 486 CPUs would suck. Even many core Pentiums would suck. Architecturally, they're old, dead ends. The best designs for multicore processors would be the P6 design (PPro/PII/PIII/PM) and the Athlon design (K7/K8 - while the K8 is "new", it's about as new as the PM is to the P6 design). Curiously enough, Intel is likely to go with a multicore PM, and AMD is likely to go with a multicore K8.

It should also be noted that a 486DX had a transistor count of 1.2M transistors. A P3 had a transistor count of 9.5M transistors. That's an increase of about 8X - however, the P3 also has twice the data width (64-bit rather than 32-bit), 4X the L1 cache (32KB rather than 8KB), and had two instruction set enhancements tacked onto it, as well as massive architectural improvements, including, essentially, multiple versions of the 486 execute engines inside it. An 8X increase in size for those enhancements is not crazy at all.

Re:Why not quad core? by hxnwix · 2004-06-14 19:34 · Score: 2, Informative

here's one with all the banks on one cpu
here's a nice little overview

30 of 309 comments (clear)