Dual Caches for Dual-core Chips

Note: Here, Single is Better by Anonymous Coward · 2004-08-26 10:10 · Score: 5, Informative

In case it's not obvious to those who didn't read the article all the way through, it's a better thing when the memory is shared (single cache) rather than separate (dual cache). But that is harder to design, so for these first-generation dual-core chips from Intel and AMD, they are using separate caches for each core. (IBM's dual core Power4 processor has a unified cache.) At some point down the road, they will likely unify them to increase performance.

Re:Note: Here, Single is Better by skribble · 2004-08-26 10:16 · Score: 4, Funny

Thanks for pointing that out, I'm sure a number of people were things "Ooooo Cool two caches" when they should have been thinking "Awwww Damn, two caches!"

--
--- Nothing To See Here ---
Re:Note: Here, Single is Better by mrchaotica · 2004-08-26 10:16 · Score: 4, Interesting

Hmm... the Power4 is dual-core and unified cache? I wonder if this has implications for future Macs to compete with these new x86 processors...

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:Note: Here, Single is Better by EvilTwinSkippy · 2004-08-26 10:29 · Score: 4, Interesting

Compete? What part of spank them and stole their lunch money does x86 fail to understand.
We have a dual p4 server, the damn thing sounds like a gas turbine when it's on. Really, I've used quieter air compressors.
Our dual-G5s from apple are quiet, sleek, and each processor gets it's own block of RAM. Granted, the ASIC for the memory controller gets it's own heat sink. But man, you crack it open and you wonder where the rest of the server is. It's literally 2 giant blocks for the processors, the ASIC that handles memory management, and a wee little chip on the end of the mobo that looks like a bus controller.

--
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
Re:Note: Here, Single is Better by spuzzzzzzz · 2004-08-26 10:34 · Score: 5, Interesting

The dual cache simplifies things emormously, especially taking the design of the Opteron into account. Opterons are incredibly scalable--each one has three HyperTransport links that can be connected to memory, I/O or another processor. In order to make dual-core chips, all AMD has to do is take two Opterons, put them in the same package and hard-wire a HT link from one processor to the other.

Of course, they also need to worry about things like size and power consumption but the simplified architecture really makes things a lot easier and will probably contribute to lower prices. It will also accelerate the introduction of multi-core (ie more than two) processors...

If they were to implement a unified cache design, they would have to make significant changes. They would need to implement cache snooping and complicated memory management. Given that the new dual-core processors (AMD ones, at least) are meant to be pin-compatible with current processors, this would be a bit much to ask. Maybe they'll have unified caches sometime, but I don't see it happening anytime soon.

--

Don't you hate meta-sigs?
Re:Note: Here, Single is Better by drinkypoo · 2004-08-26 10:49 · Score: 4, Interesting

The Hammer-core processors with dual-channel memory controllers have more memory bandwidth than the best G5, and the memory is accessed directly by the processor. Hypertransport is really quite an excellent interconnect. Hammer is NUMA-architecture and each processor gets its own block of ram. Finally, the Opteron dissipates much less energy as heat than the intel offerings - only about 46W max. I believe this is still a bit more than the G5, of course, but it's really not that bad.

So yes, the proper term is compete.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Note: Here, Single is Better by spuzzzzzzz · 2004-08-26 10:49 · Score: 5, Informative

Are there situations where two caches might be better? For example, a multi-threaded application with two memory-intensive threads, each locked down onto a specific CPU?
Not really. The problem with 2 caches is duplication. It is quite probable that both cores will want to work on the same thing, in which case cache space will be wasted. It also creates timing complications when one core wants to write to its cache because the other core will have to be told to invalidate its relevant cache entry. On the other hand, you could create a single cache with double the size. This would make sharing memory between CPUs simpler and it wouldn't significantly increase access times (so the situation you mentioned wouldn't be affected). The argument for double caches is about cost, scalability and design simplicity, not performance.

--

Don't you hate meta-sigs?
Re:Note: Here, Single is Better by jackb_guppy · 2004-08-26 11:05 · Score: 4, Informative

The PPC4 does not have single cache...

There a L1 caches for both cores.

There are 3 L2 caches hooked to cross bar switch for speed flowing data into and out of the L1

There is a single L3 controller overseeing 2 L3 external memory banks.

Then there is two busses to 2 main memory.

And 3 interconnects to 3 other dual core chips that make a single 8way processor block.

And 4 busses inter connecting 4 of these 8way to make a 32way machine, with dual IO channels to hardware!
Re:Note: Here, Single is Better by hattig · 2004-08-26 11:07 · Score: 4, Informative

No no no no.

That's all wrong.

The Opteron has always supported dual cores, and it isn't via "internal hypertransport", the internal crossbar connects to the SysReq that supports two cores attached directly. You cannot attach a shared cache dual core to this design. Each core must have its own individual L2 cache. This is why you could have an 8 processor Opteron system with dual-cores for 16 cores in total despite the fact that the current Opteron can only do 8 processors at the most glueless. Oh, and Hypertransport doesn't connect to memory either, the memory controller is something else connected to the internal crossbar.

And for the Opteron this is a good design. As the cores are on the same chip, cache coherency will be done at the speed of the processor and not be limited by inter-processor bandwidth. It really isn't a problem at all that the cores each have their own individual cache. At least they aren't competing with each other for cache bandwidth. The only bad point is that a core cannot have the option of using up to 2MB of shared cache - not as big a problem as it might sound, 1MB is doing very well for Opteron, and the on-die memory controllers negate a lot of the latency penalty for main memory access.
Re:Note: Here, Single is Better by randyest · 2004-08-26 12:45 · Score: 4, Informative

Interconnect delay (latency) is reduced. Signals propagate traces on a die (silicon chip) are orders-of-magnitude faster than printed-circuit board (PCB) traces.

That means you can get more bandwidth with silicon than a circuit board (each of reasonable size using modern components/processes.)

Also, it takes a lot less power to run lower-voltage drivers on low loads (little resistance and capactiance on die compared to a PCB.)

So, why not stack everything on onw chip? Cost of a chip rises exponentially with die size. Up to about 20mm^2, it's feasible (but pricy) bigger dice are very hard to make, result in lower yields, and hence cost a lot more.

--
everything in moderation

Licensing Issues? by xeon4life · 2004-08-26 10:12 · Score: 5, Interesting

What will happen to those who must pay a royalty fee per CPU? Will companies that charge for each CPU begin to charge for two, or will it still be viewed as one...?

--
Real programmers can write assembly code in any language. -- Larry Wall

Re:Licensing Issues? by Ianoo · 2004-08-26 10:17 · Score: 5, Informative

When hyperthreading was released, the industry had to cope with similar issues. Those of us using operating systems with artificial limits imposed on the number of possible processors used in a system had to wait for software updates to fix detection. I'm sure that the same thing will happen again, undoutedly there will be some flag in a register somewhere that identifies whether a processor is part of a dual-core chip or just a single CPU on its own. The OS or software can just read this in and work out whether there is sufficient licensing to use them.
Re:Licensing Issues? by name773 · 2004-08-26 10:28 · Score: 5, Funny

when the wind is blowing westward on odd days of the week you pay for one. when there are clouds on an even day, you pay for two. during leap year, when a west wind blows clouds away at midnight on an even day, you pay for four processors, two computers, a camel, three pci slots, and a partridge in a pear tree.

Different core models by SIGALRM · 2004-08-26 10:12 · Score: 5, Informative

The dual-core chips that Advanced Micro Devices and Intel plan to bring to market next year won't be sharing their memories

As I understand it, the rationale behind Opteron's "Direct Connect" dual-core architecture is to make it easier to place two processor cores on the same silicon die. It's also a power-consupmtion issue, as the two processors can run at lower clock speeds. However, unlike Intel's design, Direct Connect features an integrated memory controller and hypertransport interconnects that connect the processor to the I/o port or directly to another processor.

--
Sigs cause cancer.

"Montecito" by Mateito · 2004-08-26 10:13 · Score: 5, Funny

"Montecito", a spanish word, literally translates as "a small monte".

Thus I predict that this will be followed by a quad-core chip called the "monte", an 8-core chip called the "montote" (the big monte), and finally a 16-core chip known as "The Full Monte".

--
Norman Cook's Ode to Sl

yeah, by pb · 2004-08-26 10:14 · Score: 4, Interesting

You probably don't want to have both chips fighting over the cache, and slowing things down; I'm sure doing The Right Thing[tm] will take a while for them to work out. Until then, just pretend that they're mostly separate chips on the same silicon.

Maybe in the future they'll come up with some more advanced cache designs that can share some cache and improve performance. But until then, expect to see it in the next generation of value chips. (Overclocked dual-core Celerons? Nifty!)

--
pb Reply or e-mail; don't vaguely moderate.

Re:mmmm cores by bburton · 2004-08-26 10:15 · Score: 5, Funny

Can I have a 64bit OS too please? (no not linux)

Didn't you hear? According to SCO, Linux doesn't even exist!

--
Slashdot = ((Technology + Politics) / Trolls) % Grammar Nazis

Re:Confused by dougmc · 2004-08-26 10:16 · Score: 4, Informative

No, you're not missing the point.

The benefit is that you get two CPUs in less space. You might even be able to get two CPUs in a system designed to support only one (because it has only one slot.) And if your system already has two CPU slots, this might give you four CPUs.

It might also use less power than two CPUs, but I wouldn't hold my breath on that one.

Re:Confused by eddy · 2004-08-26 10:16 · Score: 4, Interesting

Yes. Actually, I would have thought that the reverse (shared cache) would have been news instead.

The point is that you can have very fast inter-CPU communication, the moderboard gets cheaper to produce, you don't have to double the cooling machinery... and they're probably cheaper to produce also (one package instead of two).

I assume the cores are actually produced one-by-one or it'd get big and very expensive.

--
Belief is the currency of delusion.

Re:Confused by ERJ · 2004-08-26 10:17 · Score: 5, Informative

Kinda. I could see a couple advantages though:

1) Fast interconnect between chips. Instead of having to transfer data over the bus, if the CPU needed info from the other CPU it could transfer over a high speed connection without having to involve other parts of the machine (bus). AMD already has a sort of high speed interconnect to their multi-cpu motherboards instead of splitting like intel does but I would imagine that this would still be faster.

2) Less motherboard room needed. You don't need dual cooling fans, dual power / interface lines and have more room overall on the motherboard.

Re:How is this different from a two processor syst by hawkbug · 2004-08-26 10:19 · Score: 5, Informative

It's not much different - that's the point. 2 processors in a single socket, saves a lot of money production wise, and that should pass onto the consumer. AMD has said their's is backward comaptible, and that's huge. You already got a single cpu opteron workstation? Well now you can have a dual cpu one for the price of a single cpu upgrade. That kicks ass.

Inside the dual core by spirit_fingers · 2004-08-26 10:19 · Score: 4, Funny

Actually, the left core will be verbal, creative and be really good at procesing visual information, while the right core will be logical, good at number crunching and have no style sense whatsoever.

Re:Itanium? (somewhat off-topic) by Anonymous Coward · 2004-08-26 10:20 · Score: 5, Informative

Despite what Sun has to say on the matter, Itanium system and processor sales have been increasing steadily since 2H,2000prior to that, there was a big lull in demand because few wanted to buy underperforming Itanium 1 machines when the Itanium 2 was expected rather soon (and announced relatively early).

Today, in contrast, there _doesn't_ appear to a lull in demand for Itanium 2 machines, even though Montecito (Itanium 3) has been announced in a fair bit of detail. That's because for some applications (in HPC, high-end database work, certain EDA/CAD/CAE work, and ultra-high-reliability computing) Itanium 2 systems are basically unbeatable. They also run some OSes which are very important to some organizations, such as HP-UX and OpenVMS.

Long story short, the Itanium 1 was something of a flop, the Itanium 2 is really pretty decent, and everyone is expecting the Itanium 3 to offer pretty decent _price/performance_, in addition to best-bar-none performance when it is released next year.

Re:New Computer by AKAImBatman · 2004-08-26 10:25 · Score: 4, Funny

Well I would buy a computer now but I have no cash

Is that a pun?

--
Javascript + Nintendo DSi = DSiCade

Re:mmmm cores by iNiTiUM · 2004-08-26 10:26 · Score: 5, Informative

Sure you can
Oh you want one for the AMD64?
How about these?

--
When encryption is outlawed, ou++1!@(93j++js-d9298yIUH(*Y24JKB!~

Commodity hardware grows mature. by Skulker303 · 2004-08-26 10:29 · Score: 5, Insightful

Daul core microprocessors are not a new development. IBM with their POWER4 and POWER5, HP and the PA-RISC 8800, and TI with their OMAP processors are definitive proof that multi-core solutions are not just a stop gap in increasing the performance delta of modern silicon.

Daul core processors are a natural evolution in the development of general purpose and even specialized computing devices. SMT was to be a boon for the EV8, but later found its way into the Pentium4. Multiple logical processors were just a first step.

It should be interesting to see just what AMD can do with both SMT and a daul core design.

It just had better run BSD. = )

Re:mmmm cores by kennedy · 2004-08-26 10:30 · Score: 4, Informative

wrong. the ps2 has a 64bit MIPS cpu with *128bit extentions*. Think MMX or SSE.

coming this fall on Fox... by rarose · 2004-08-26 10:34 · Score: 5, Funny

It's "RISC CPI for the CISC guy"

I can't wait to see what they do to his nonorthogonal register file.

--
--Rob

The down-side to this.... by NerveGas · 2004-08-26 10:34 · Score: 4, Informative

The downside is that as the AMD chips are going to be backward-compatible with older boards, I imagine that the dual-core chip will still only have the single 128-bit memory controller.

While that will still give you twice as many available CPU iterations, that means that the two cores will be fighting for memory bandwidth. In the case of Intel's chips, that's business-as-usual: But for the Opterons, where each processor brings its own memory controller, it just doesn't feel right. : (

steve

--
Oh, you're not stuck, you're just unable to let go of the onion rings.

Sure, OS/400 by Shivetya · 2004-08-26 10:35 · Score: 4, Insightful

Been that way for many years. Is rock stable and secure.

Granted it is on a mini, but we have enjoyed 64bit computing for nearly nearly 10 years. Even have some power5s in production.

There are great OSes other than the ones used on PC hardware... too many "geeks" forget that.

--
* Winners compare their achievements to their goals, losers compare theirs to that of others.

So can it crash twice now ? by mcraig · 2004-08-26 10:35 · Score: 5, Funny

Kernel Panic Core Dumped... Still Panicking Dumping Second Core...

It's nice but.. by leathered · 2004-08-26 10:45 · Score: 4, Insightful

I like Itanium. It's a pretty neat architecture which crushes most before it in FP intensive tasks. It is clear why it has done well in HPC. But HPC is nothing more than a niche.

Now here are the problems:

32 bit (x86) perfomance sucks. All those apps you've spent years developing will need re-writing (A simple recompile is often out of the question).

HP (in collusion with Intel) killed perfectly good archs. in Alpha and PA-RISC in an effort to get people to migrate to IA-64. A few may have made the move but this has mostly served to push people towards the vasty cheaper x86. HP, and to a lesser extent Intel, should provide what their customers want, not what they think is best for them.

It still uses a shared bus architecture. There are diminishing returns as you add more processors.

Itanium requires massive caches to get the best from it. Cache = Silicon = Cost. It is clear that a large scale seeding exercise is still underway with Itanium systems being provided at or below cost. Looks like it will be a long time before there will be any return on the billions invested in Itanium.

--
For all intensive porpoises your a bunch of rediculous loosers

Re:Dual core - what's the point? by drinkypoo · 2004-08-26 10:55 · Score: 4, Informative

Hyperthreading is simply a second context. It lets you run a second thread at the same time by using the unutilized capacity of existing functional units and is largely useful only when intel's branch prediction fails and the chip would otherwise be paying the ultimate penalty for its long, long, LONG pipeline.

In other words, HT is an ingenious method for making up for the fact that the pentium 4 is horribly inefficient.

It would be better to stick a whole bunch of simple cores on a single chip at a lower clock rate and have them work cooperatively, if only we used more multithreading. This is pretty much where intel is planning to go, with their multiple-core chips based on the Pentium-M. Or, so the rumors say.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

The G5 uses hypertransport... by Ayanami+Rei · 2004-08-26 11:07 · Score: 4, Interesting

hence the block of RAM per CPU.

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON

Re:The G5 uses hypertransport... by shawnce · 2004-08-26 11:42 · Score: 4, Informative

Actually they don't use the same bus technology.

The G5 (PPC970/970FX) has a two 32 bit wide buses one going in each direction from the CPU and they have a data rate at half that of the CPUs clock rate. At a clock rate of 2.5GHz the bus is capable of a max theoretical throughput of 5GB/s each direction or 10GB/s in total (that is per CPU). Real world throughput is around 8 GB/s per CPU at 2.5GHz because of address/command overhead. Apple/IBM terms this the elastic bus and it is not HT based.

For more information see this block diagram referenced from this hardware tech note.

Anyway the the post you are replying to is incorrect about each CPU having its own RAM. That is not true. Each CPU has it own independent bus to the memory controller (U3/U3H) and that controller has a dual channel connection memory capable of 6.4GB/s a second (DIMMS are required to be added in pairs to allow for a 128 bit wide path to memory). The U3 chip is basically cross bar like internally allowing for a few point-to-point connections to be taking place between its various interfaces (CPU to CPU, AGP to memory, etc.).

HT is used for as a secondary interconnect to relatively lower bandwidth devices in the IO chain.

Re:mmmm cores by shawnce · 2004-08-26 12:13 · Score: 4, Informative

Pulling in a post of mine from a completely different forum...

The G5 is a 64 bit processor and OSX Panther is a 64 bit OS. :)

Panther is not a true 64 bit OS in the traditional sense of the word. It does not support 64 bit addressing[1]. It does however support the use of 64 bit math operations and the saving of related registers on the CPU.

Tiger (Mac OS 10.4) will have the first steps towards a true 64 bit OS by allowing 64 bit addressing (virtual addressing) to be used for libSystem only based tools (command line applications, no GUIs, etc.). At least that is all that Apple has so far committed to doing in Tiger at this time (cannot say more because of NDA).

[1] Note the Panther kernel has support for 64 bit physical addressing so the system can utilize greater then 4 GBs of RAM (hardware wise supporting up to 16 GB of RAM) but it does not support 64 bit virtual addressing (what applications use) at this time.

Re:mmmm cores by Hoser+McMoose · 2004-08-26 13:26 · Score: 4, Interesting

With a 32-bit OS and 32-bit applications you can only access a maximum of 2 or 3GB of data at a time (possibly even less due to memory fragmentation). This may or may not affect what you do.

If you do indeed have files as big as DVDs, it would certainly help with editing those files. You CAN break those up into chunks, only having 2GB or less in memory at any given time, and for the most part this works ok, however it does tend to be a bit of a kludge at the best of times, and sometimes it just flat out doesn't work.

As you correctly guess, servers are the first situation where this really makes sense. If you've got a database that is more than 2GB in size, you REALLY want a 64-bit system, otherwise you'll tend to take a big performance hit. Many high-end workstations require 64-bit systems as well to process all the data.

So, where is the benefit for the end-user? Well that depends on the user. First off, having more than 2GB of physcial memory on a 32-bit processor requires some really ugly hacks to make things work. They do work, but it is a really dumb idea. It was a annoying and crappy when we were forced to do it back in the 16-bit days, and it hasn't gotten any better. Secondly people are using bigger and bigger data files on their home PC, editing larger pictures and videos, playing games with more graphics and sound, some even run into issues with types of databases (I know my Usenet newsreader sometimes craps out when I'm downloading too much pr0n because of database limits). Basically you might not need it, but someone else might. The best part about it though is that 64-bits is "free".

Basically you've got a 64-bit CPU that is no more expensive than competiting 32-bit chips and Microsoft has said that 64-bit WinXP Pro will sell for the same price as 32-bit WinXP Pro, so really the question is not so much "Why" do we need 64-bit, but "why not?"

Re:Day late, dollar short... by kscguru · 2004-08-26 15:36 · Score: 4, Informative

These chips must share a relatively slow memory bus with other devices.

No... on AMD chips the memory bus is dedicated. Intel chips have a very different system architecture (which does saturate at ~2 CPUs), but AMD gives each chip its own memory controller and memory - scales perfectly. (By the way, this isn't new ... big iron (e.g. Sparc) has been doing this for years).

Currently, the fastest FSB to date is 1033MHz - almost 1/3 of the max clock speed of the processor. Given that Intel's integer units operate at twice the clock speed, the fastest parts of the chip operate at 6 times faster than memory.

That's why modern processors use pipelining (in x86, since 486's) and caches (since, uh, 8086s ?). FSB only comes into play in 1-2% of the memory accesses. But those memory accesses are pipelined, interleaved, with multiple outstanding requests issued by the out-of-order pipeline ... processor designers have been working around a slow bus for years, and the FSB is only the bottleneck in extreme, pathological cases.

The monolithic, synchrous, central-processing-unit design of the architecture prohibits optimizations such as using memory controllers for block moves and having dedicated IO processors

Ever heard of DMA? A DMA controller does that memory transfer ... there are 2 DMA controllers with 8 channels on your current x86 PC. Heck, high-end PCI cards even have their own onboard DMA engines (it's called bus-mastering). I/O offload? You've obviously never written a device driver... modern drivers issue a few "start" instructions, then sleep; eventually the device completes the I/O and issues an interrupt to inform the CPU it's done. The last computer I had that stalled on disk I/O was running MS-DOS - nine years ago.

In all fairness, I thought exactly the same things four years ago. Then I learned about modern computer architecture. And in today's world (and, in fact, all PCs for the past ten years), your points are completely - and utterly - irrelevant.

--

A witty [sig] proves nothing. --Voltaire

Slashdot Mirror

Dual Caches for Dual-core Chips

38 of 342 comments (clear)