AMD to debut multi-core CPUs in 2005

← Back to Stories (view on slashdot.org)

AMD to debut multi-core CPUs in 2005

Posted by CmdrTaco on Thursday October 16, 2003 @07:58AM from the coming-to-a-pc-near-you dept.

Scrooge919 writes "An article on ZDNet discusses AMD's plan for the successor to Opteron -- the K9. The biggest feature will be that it contains multiple cores. The K9 is currently slated for the second half of 2005, which would be less than 3 years after the Opteron shipped."

3 of 341 comments (clear)

Min score:

Reason:

Sort:

This makes a lot of sense. by NerveGas · 2003-10-16 08:09 · Score: 2, Interesting

As the manufacturing process shrinks, and companies are able to put more transisters on a chip, the question arises: What should we use those extra transistors for?

Now, there are several options. They could come up with a new processer design, but that takes a tremendous amount of R&D. They could just put tons of cache on the chip, but that gives diminishing returns.

Or.... the Opterons already have very simply I/O mechanisms, namely, HyperTransport. Literally all they have to do is plop down two Opteron cores, connect the HyperTransport lines, and bam: Dual-core processer. I'm honestly surprised they're not doing it SOONER.

Of course, the lines for memory controllers and the like have to be drawn out to the pins on the packaging, but that's a piece of cake.

steve

--
Oh, you're not stuck, you're just unable to let go of the onion rings.
When shall we be free of the X86? by LWATCDR · 2003-10-16 08:15 · Score: 4, Interesting

Folks we really do not need to run DOS applications any more. If we do couldn't we emulate them. I just do not believe that the IAx86 is the best IA for the future. The idea that in 30 years we will be runing some mutant 128 bit X86 chip makes my skin crawl. I guess I miss the days when new ideas where the norm for microcomputers. Rember when there was the 32032, 68020, TM990, Zilog z8000, the 6502 family, and the 88000? . How about it Transmeta? Let's see a version of Linux that does not run on top of the the translation layer. Lets get some new ideas out there I am betting bored.
Now that I said that, GO AMD. While it is still X86 this is one of the more interesting ideas I have seen for a while.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
And a serious comment... by jd · 2003-10-16 08:47 · Score: 3, Interesting

"Multiple cores" is meaningless, with today's microprocessors. Typically, there will be multiple execution units for common instructions. Pipelining, pre-fetch and branch prediction all increase performance by more than can be obtained by using antiquated SMP-style approaches. It's far more important to distribute the bus load over time, as that is the larger bottleneck.

By having multiple register sets within a single core, and tagging requests/results, you can avoid the complexity of SMP entirely, while producing the effect of having multiple processors.

If you want to go further, improve the support for internal routing of operations. Thus, if you've instructions operating on the same data, the data can be directly sent from logic element to logic element. The entire chain could then be executed as a single instruction (albeit composite). This also eliminates the need to have a CISC-to-RISC layer in the processor, as complex instructions would be mapped by routing commands and not by multiple internal fetch/execute cycles.

By adding input/output FIFO queues to each instruction, where each node in the queue tagged the "virtual" processor associated with that instruction, the CPU would be limited in the number of CPUs it would look like only by the number of bits used in the tag. (eg: An 8-bit tag gives you 256 virtual CPUs on a single die.)

Why is this better than "true" SMP? Because 2 CPUs can't run a single thread faster than 1 CPU. Programs are generally written with single processor systems in mind, and therefore cannot run any better when the extra resources exist.

Sub-instruction parallelism allows you to run as fast as you can fetch the instructions. Because the parallelism is merely at the bookkeeping level, there's no overhead for extra threads.

Because the logic elements would pull off the queues, as and when they were free to do so, there's no task-switching latency.

Because the parallelism is sub-instruction, and not at the instruction block or thread level, more of the resources get used more of the time, thus increasing CPU utilization. It also means that tasks that aren't parallel at a coarse-grain can likely get some benefit, as there may well be parallelizations that can be done at the element level.

Because a single, larger die can carry with it more useful silicon than two or more seperate dies. (Which is likely why AMD are using multiple cores in their K9 CPU.)

AMD's approach is an improvement over the seperate CPU schema, but it's nowhere near the potential an element-cluster could provide. The parallism that can be gained is way too coarse-grain. It'll offer about the same level of improvement the move from seperate 386 and 387 chips to the 486DX did, for much the same reason. Reduced distances and reduced voltages allowed for faster clock rates on the same technology.

But engineering at the right level will always produce better results than cut-and-paste construction, even if it does require more thought.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)