Dual Caches for Dual-core Chips

mmmm cores by zaqattack911 · 2004-08-26 10:10 · Score: 3, Insightful

Can I have a 64bit OS too please? (no not linux)

Re:New Computer by Izago909 · 2004-08-26 10:13 · Score: 2, Insightful

With that logic, you'll always be holding off for some new development.

Re:Confused by benjamindees · 2004-08-26 10:18 · Score: 1, Insightful

Of course you're missing the point! You're concentrating on the technical value of such a design.

You should be concentrating on the marketing bullshit value instead.

--
"I assumed blithely that there were no elves out there in the darkness"

Commodity hardware grows mature. by Skulker303 · 2004-08-26 10:29 · Score: 5, Insightful

Daul core microprocessors are not a new development. IBM with their POWER4 and POWER5, HP and the PA-RISC 8800, and TI with their OMAP processors are definitive proof that multi-core solutions are not just a stop gap in increasing the performance delta of modern silicon.

Daul core processors are a natural evolution in the development of general purpose and even specialized computing devices. SMT was to be a boon for the EV8, but later found its way into the Pentium4. Multiple logical processors were just a first step.

It should be interesting to see just what AMD can do with both SMT and a daul core design.

It just had better run BSD. = )

Re:Commodity hardware grows mature. by Jhan · 2004-08-27 04:39 · Score: 2, Insightful

Daul core microprocessors are not...
Daul core processors are...
It should be interesting to see just what AMD can do with both SMT and a daul core design.

You keep using that word. I do not think it means what you think it means.

--
I choose to remain celibate, like my father and his father before him.

Sure, OS/400 by Shivetya · 2004-08-26 10:35 · Score: 4, Insightful

Been that way for many years. Is rock stable and secure.

Granted it is on a mini, but we have enjoyed 64bit computing for nearly nearly 10 years. Even have some power5s in production.

There are great OSes other than the ones used on PC hardware... too many "geeks" forget that.

--
* Winners compare their achievements to their goals, losers compare theirs to that of others.

Re:yeah, by laudney · 2004-08-26 10:37 · Score: 2, Insightful

The cause for cache conflict is not a hardware but a software one. Suppose there is one process/thread running on each core. When the two processes have incompatible instruction/data streams that evict each other out of the cache, performance is seriously reduced. This requires an intelligent enough OS scheduler.

It's nice but.. by leathered · 2004-08-26 10:45 · Score: 4, Insightful

I like Itanium. It's a pretty neat architecture which crushes most before it in FP intensive tasks. It is clear why it has done well in HPC. But HPC is nothing more than a niche.

Now here are the problems:

32 bit (x86) perfomance sucks. All those apps you've spent years developing will need re-writing (A simple recompile is often out of the question).

HP (in collusion with Intel) killed perfectly good archs. in Alpha and PA-RISC in an effort to get people to migrate to IA-64. A few may have made the move but this has mostly served to push people towards the vasty cheaper x86. HP, and to a lesser extent Intel, should provide what their customers want, not what they think is best for them.

It still uses a shared bus architecture. There are diminishing returns as you add more processors.

Itanium requires massive caches to get the best from it. Cache = Silicon = Cost. It is clear that a large scale seeding exercise is still underway with Itanium systems being provided at or below cost. Looks like it will be a long time before there will be any return on the billions invested in Itanium.

--
For all intensive porpoises your a bunch of rediculous loosers

Re:yeah, by Anonymous Coward · 2004-08-26 10:54 · Score: 2, Insightful

You probably don't want to have both chips fighting over the cache, and slowing things down

As a rule of thumb, if both cores are running threads from the same process (or two processes using shared memory) then shared cache is good because it increases inter-thread bandwidth and decreases inter-thread latency.

But, if it is just two random processes doing there own thing with little to no interprocess communication, then independent caches are better because you need not worry (as much) about them fighting over the same cache-lines, each mapped to their own different memory spaces. N-way caches help with that kind of problem, and for a large N, might be seen as a good compromise, but the larger the N, the slower and/or more expensive($$$) the cache becomes.

Re:Note: Here, Single is Better by riptide_dot · 2004-08-26 10:55 · Score: 2, Insightful

FTA: Keeping the cache as one single unit theoretically allows each processor core to access more data in a rapid fashion. Dividing the cache, however, also cuts down on some design work.

In case it's not obvious to those who didn't read the article all the way through, it's a better thing when the memory is shared (single cache) rather than separate (dual cache).

Yes, it's better to have a single cache for performance reasons (cache "hit" rates would theoretically be higher with a single larger cache). But it's also better for other reasons too - more L1 and L2 cache (which is made using SRAM, not DRAM) is really expensive. Two cache modules mean more pricey chip$.

--
I was in the park the other day wondering why frisbees get bigger and bigger the closer they get - and then it hit me.

Re: unified (single) is not always better by mepperpint · 2004-08-26 10:58 · Score: 3, Insightful

It's not entirely true that single is better. It depends on what the system is used for. If both cores are accessing the same memory (likely the case in a multi-threaded webserver for instance), then they can benefit from sharing a cache and effectively doubling the cache size. However, if both cores are accessing different memory (almost any situation where different applications are running on the different cores), then sharing a cache could have devastating effects on performance. As each process running on each of the cores would be likely to be evicting the other processes cached memory, there would be a plethora of cache misses. In the worst case, this could effective make the system as slow as if there were no cache at all. In the average case there would likely be a significant performance hit. A better strategy than unified or seperate caches would be to have a read/write cache for each core and allow each core to read the other core's cache. This would allow the benefits of the shared cache in the case where both cores were accessing the same memory without having the major performance hit when each process is accessing different memory. Unfortunately the hardware for this would be even more complicated than for the unified or seperate cache techniques.

Day late, dollar short... by gillbates · 2004-08-26 11:15 · Score: 3, Insightful

While dual cores on a chip might be nice, it won't produce any serious performance increases.

The underlying problem with Intel and AMD's processors is that they are at the mercy of the architecture:

These chips must share a relatively slow memory bus with other devices.
Currently, the fastest FSB to date is 1033MHz - almost 1/3 of the max clock speed of the processor. Given that Intel's integer units operate at twice the clock speed, the fastest parts of the chip operate at 6 times faster than memory.
The monolithic, synchrous, central-processing-unit design of the architecture prohibits optimizations such as using memory controllers for block moves and having dedicated IO processors. Contrast this with Mainframes in which the CPU passes off IO instructions to ancillary processors and continues to work. In PC-land, when the IDE controller seizes the bus for a transfer from disk into memory, the CPU has to execute out of its cache for ~256 instruction cycles, or risk stalling.

The ironic thing is that even though AMD and Intel are out-clocking mainframe processors by factors of 2 and 3, mainframes still get more work done simply because they aren't choked by a slow and overcrowded system bus .

--
The society for a thought-free internet welcomes you.

Re:Note: Here, Single is Better by Anonymous Coward · 2004-08-26 11:17 · Score: 1, Insightful

You have to be kidding. The noise has bugger all to do with the processor inside the box. It's the cooler obviously. You pay good money for a P4 cooler and you'll get some really good examples of fine engineering - take Zalman for example. If you stick with the shitty stock cooler you get for free with the chip then expect noise.

If Apple gave up their design principles and just stuck any old cooler on their G5's would you still complain? My spidey-senses tell me no.

Bus architectures are the key by Shapemaker · 2004-08-26 11:45 · Score: 2, Insightful

From what I know of the current architectures, AMD's solution to main memory access woes (point-to-point bus) seems more sane as soon as more than a couple of processors are installed in the system. Shared bus (as in Intel's solution) seems to require huge caches to operate efficiently, and as we all know, Pentium 4 really does not like pipeline stalls or branch mispredictions.

Let's take a hypothetical example: quad processor systems utilising dual core processors from Intel and AMD.

AMD: each processor (core) talks directly to its local memory block, and via HT links to adjacent processors' memories. Processors do not have to contest for access to the bus and thus memory access is always low-latency, even when accessing remote memory. If built today, HT links would operate at 1 GHz.

Intel: processors share the same bus with each other and memory controller. Any time a processor needs to access memory, it has to wait until the bus frees to ask the memory controller access to main memory. Pipeline stalls happen here if bus is not free when needed. This is compensated with huge L3 caches. As far as I know, current quad processor systems from Intel have bus speeds of around 533 MHz.

So in a nutshell, Intel competes with AMD on a quite level field when the system has 1-2 processors, but as soon as processor count goes up, bus bandwidth becomes an issue with Intel. It shall be interesting to see how Intel attempts to counter that.

What I am getting at with this? Well, those huge 12 MB L3 caches in Intel's future processors sure aren't cheap. They take up lots of silicon and WILL decrease core yields since they've got lots and lots of points of failure. So manufacturing processes really have to be ramped up to allow that at reasonable cost.

--
"Intellectual Property" should be an affront to anyone capable of independent thought.

Re:Non-news event by Wesley+Felter · 2004-08-26 13:22 · Score: 2, Insightful

Er, cache coherency works exactly the same way in a multicore chip as in an old-fashioned SMP. Opteron, Xeon, and small Itanic systems use the time-tested broadcast snooping algorithms that are taught in undergraduate courses.

Re:what's the diff: dual core and hyperthreading? by RockyMountain · 2004-08-26 15:06 · Score: 2, Insightful

It's almost as if they've reached some kind of break even point, where the probability of a defect on a dual core die falls to the point where the gains outweigh the losses.

The gains definitely outweigh the losses, or they wouldn't do it. But the gains don't only come from CPU cost-per-core. There are lots of other factors, such as density, power efficiency, potential for core-to-core lockstep, etc.

I have no first-hand knowledge of AMD, but for Itanium, smaller process geometries do not increase yield through smaller die size. As they've shrunk to smaller geometries, they have not shrunk die size at all. All the extra real estate goes into larger caches, and the die size, and thus the (raw) yield, remains about the same. They have improved yield, but it's not through shrinking the die.

They have dramatically improved yield in other ways. As your execution units shrink and cache dominates an ever-increasing percentage of the die area, it becomes easier to use redundancy to make the chip tolerant of defects.

Intel calls it Pellston Technology (I hate marketing speak). And it it this, more than anything, that makes such massive die as Montecito even possible. In the old days, one defect trashed the die. With this sort of technology, most defects are worked around through redundancy. And, if you have too many defects to allow that, you may still salvage the die by selling it at a lower price with a reduced-capacity caches. Most chips shipped to customers have several completely corrected defects.

You're wrong.

Probably. Wouldn't be the first time.

If you take into account the overall _system_ cost, dual-core is definitely far cheaper than dual-socket. System cost also includes cost of board area, power delivery, cooling, etc. OEMs will happily pay well over double for a 2-core vs 1-core because of the savings they will make elsewhere in the system. Dual-core also gives system vendors much more flexibility by allowing the same board design to support twice as wide a range of CPU counts.

I was comparing CPU+package cost only. Having been both a CPU designer and a system designer at different times in my career, I know how to look at it both ways.

Slashdot Mirror

Dual Caches for Dual-core Chips

16 of 342 comments (clear)