Dual Caches for Dual-core Chips
DominoTree writes "The dual-core chips that AMD and Intel plan to bring to market next year won't be sharing their memories. A version of Opteron coming in 2005 and Montecito, a future member of Intel's Itanium family also slated for next year, will both have two processor cores, the actual unit inside a processor that performs the calculations, and each core will have separate caches."
Can I have a 64bit OS too please? (no not linux)
In case it's not obvious to those who didn't read the article all the way through, it's a better thing when the memory is shared (single cache) rather than separate (dual cache). But that is harder to design, so for these first-generation dual-core chips from Intel and AMD, they are using separate caches for each core. (IBM's dual core Power4 processor has a unified cache.) At some point down the road, they will likely unify them to increase performance.
Hopefully gonna be able to build a new computer by christmas (if I ever get a job) but maybe i should save my money and hold off until new chipsets and mother boards come around?
but will it make coffee? I didn't think so.
I say we just grow up, be adults and die.
I'm not a hardware pro, but is this basically the same as having two seperate chips, or am I missing the point here?
Is Intel still developing the Itanium? I thought that it was a flop? Are they hoping that future sales will be stronger, or is the Itanium not the titantic everyone plays it out to be?
What will happen to those who must pay a royalty fee per CPU? Will companies that charge for each CPU begin to charge for two, or will it still be viewed as one...?
Real programmers can write assembly code in any language. -- Larry Wall
Sigs cause cancer.
"Montecito", a spanish word, literally translates as "a small monte".
Thus I predict that this will be followed by a quad-core chip called the "monte", an 8-core chip called the "montote" (the big monte), and finally a 16-core chip known as "The Full Monte".
Norman Cook's Ode to Sl
You probably don't want to have both chips fighting over the cache, and slowing things down; I'm sure doing The Right Thing[tm] will take a while for them to work out. Until then, just pretend that they're mostly separate chips on the same silicon.
Maybe in the future they'll come up with some more advanced cache designs that can share some cache and improve performance. But until then, expect to see it in the next generation of value chips. (Overclocked dual-core Celerons? Nifty!)
pb Reply or e-mail; don't vaguely moderate.
Reading this it sounds like the two cores will be seperate in terms of cahce and internal registers, so I'm wondering how this is differnt from two different processor (except of course the form factor and the higher speed internal bus between the two cores).
I bought my girlfriend has her own dual cache for her dual-core chips last christmas!
I've saw this article at another website earlier today, and I though this wasnt really important. Each core should have its own cache, thats exactly what a dual core chip is. Not twice as many execution units crammed into the same space, or some other funny configuration, its two seperate chips on the same die, perhaps some modifications for inter-processor communication, but thats about it. With AMD's core design, you have the physical layer only of the hypertransport bus to connect the chips, and the integrated memory controller has one or two ports to talk to memory (single/dual channel) and two ports to talk to two seperate chips. It will be interesting to see if AMD couples dual-core chips with DDR2-667 or DDR2-800, that would make the most sense, as to keep the memory controller from being the bottleneck, as opposed to the system bus on the intel side.
The Doormat
If you're not outraged, then you're not paying attention.
Since when was that NOT the case in the computer world if you wanted to buy something new? Too bad second hand stuff is ridiculously overprized...
It's not much different - that's the point. 2 processors in a single socket, saves a lot of money production wise, and that should pass onto the consumer. AMD has said their's is backward comaptible, and that's huge. You already got a single cpu opteron workstation? Well now you can have a dual cpu one for the price of a single cpu upgrade. That kicks ass.
Actually, the left core will be verbal, creative and be really good at procesing visual information, while the right core will be logical, good at number crunching and have no style sense whatsoever.
I don't understand all the hype around dual core. Maybe I'm being stupid. Two chips on one core seems like a great idea, and I'm sure it will improve performance.
But Intel has already demonstrated there is surely a better solution - something like SMT, hyperthreading.
Wouldn't it be saner to build a chip with double the number of execution units and double the number of instruction fetch/decode units and a larger reorder buffer that would appear, say, as four logical processors to a system? Surely you could get higher utilisation of your arithmetic logic units from such an arrangement than you could with two entirely separate processors?
Or is the simple advantage with dual core that you don't have to distribute the same clock over the entire silicon die? I know this is becoming a big problem with complex VLSI, and I guess this might be a half-way solution until clockless designs arrive.
Can anyone "in the know" answer this question?
to see if they can market this as a consumer machine someday. As long as windows isn't bogged down by spyware, a $500 dell machine can browse the internet, play music, and have a document or 2 open in Word without slowing down considerably.
It's obvious they will be helpful in games(even if games don't take advantage of the dual core, having all your OS threads running seperately will help) and scientific computing, but it seems to me that small of an audience makes it harder and harder to rationalize spending big money on chip R&D. Now those markets will always have an unquenchable thirst for power, but most of the cost for chips I think would be fixed costs(R&D, fabs etc), thus you can only get cheap if you go in volume....
Oh wait, Longhorn...damn, nevermind the above post.
Monstar L
as subject
http://slashdot.org/~GuyFawkes/journal
Luckily for AMD, the Opteron/A64 was designed with dual-core in mind. As I understand it both cores will talk to each other via an internal Hypertransport link and (as with current Opertons) together with the internal memory controller will eliminate the need for an external northbridge. It is also expected that upon release they will drop directly into existing motherboards with nothing more than a BIOS upgrade.
Intel will find things more challenging. Both cores will have to contend the GTL bus, currently the Achilles heel of their MP solutions, by communicating via an external northbridge.
For all intensive porpoises your a bunch of rediculous loosers
If you add an extra execution unit to a CPU, you have to add all sorts of logic to decide what is pairable with what else (i.e. allocation of execution units such that they don't collide in terms of input or output registers).
At somepoint you reach a limit where you can't use extra execution units because you don't know the input values to an instruction because the previous instructions upon which it depends are still in the pipline of other execution units.
Dual core avoids that... plus if you validate one core, you can cookie-cutter another one in and have minimal new validation. Especially if the communication between cores is a HT link just like what you're using in a single core design to talk to the outside world already.
--Rob
Daul core microprocessors are not a new development. IBM with their POWER4 and POWER5, HP and the PA-RISC 8800, and TI with their OMAP processors are definitive proof that multi-core solutions are not just a stop gap in increasing the performance delta of modern silicon.
Daul core processors are a natural evolution in the development of general purpose and even specialized computing devices. SMT was to be a boon for the EV8, but later found its way into the Pentium4. Multiple logical processors were just a first step.
It should be interesting to see just what AMD can do with both SMT and a daul core design.
It just had better run BSD. = )
In case it's not obvious to those who didn't read the article all the way through, it's a better thing when the memory is shared (single cache) rather than separate (dual cache). But that is harder to design, so for these first-generation dual-core chips from Intel and AMD, they are using separate caches for each core. (IBM's dual core Power4 [ibm.com] processor has a unified cache.) At some point down the road, they will most likely unify them to increase performance.
It's "RISC CPI for the CISC guy"
I can't wait to see what they do to his nonorthogonal register file.
--Rob
The downside is that as the AMD chips are going to be backward-compatible with older boards, I imagine that the dual-core chip will still only have the single 128-bit memory controller.
While that will still give you twice as many available CPU iterations, that means that the two cores will be fighting for memory bandwidth. In the case of Intel's chips, that's business-as-usual: But for the Opterons, where each processor brings its own memory controller, it just doesn't feel right. : (
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
Been that way for many years. Is rock stable and secure.
Granted it is on a mini, but we have enjoyed 64bit computing for nearly nearly 10 years. Even have some power5s in production.
There are great OSes other than the ones used on PC hardware... too many "geeks" forget that.
* Winners compare their achievements to their goals, losers compare theirs to that of others.
Kernel Panic Core Dumped... Still Panicking Dumping Second Core...
A dual cache will NEVER fix the usual double cell problem that often comes up, right?
I like Itanium. It's a pretty neat architecture which crushes most before it in FP intensive tasks. It is clear why it has done well in HPC. But HPC is nothing more than a niche.
Now here are the problems:
32 bit (x86) perfomance sucks. All those apps you've spent years developing will need re-writing (A simple recompile is often out of the question).
HP (in collusion with Intel) killed perfectly good archs. in Alpha and PA-RISC in an effort to get people to migrate to IA-64. A few may have made the move but this has mostly served to push people towards the vasty cheaper x86. HP, and to a lesser extent Intel, should provide what their customers want, not what they think is best for them.
It still uses a shared bus architecture. There are diminishing returns as you add more processors.
Itanium requires massive caches to get the best from it. Cache = Silicon = Cost. It is clear that a large scale seeding exercise is still underway with Itanium systems being provided at or below cost. Looks like it will be a long time before there will be any return on the billions invested in Itanium.
For all intensive porpoises your a bunch of rediculous loosers
Can we get a courtesy (cache) flush?
--Rob
And of course you can have a full 64 bit OS using Mac OS X "Tiger".
http://www.thinksecret.com/news/antares.html
What about cache sync? Educate me here but I would have thought that a double-sized shared cache would be faster than two seperate caches that have to be synced all the time. Am I an idiot?
A friend purchased a 3GHz( yes 3 ) Intel Pentium 4 with HyperThreading a few months back. I asked why he didn't purchase an AMD CPU and he said he needed x86 compatibility... So much for informed hardware engeers. Anyway, I recently asked him about the system since I just built an AMD 2600+ based system and wanted to know if he had some code he wanted to compare/test. Well, he told me that his 3GHz CPU really only runs most applications at 1.5GHz except if they are multi-threaded or hyperthread aware.
Is this true? Does Intel put a 3GHz label on 1.5GHz dual/core CPU's or whatever this hyperthreading is? Sounds dual/core-ish to me...
It's funny how that 1.5GHz number shows up again in Intel product. I remember when they could not build anything faster than 7xxMHz and then all of a sudden, they had a "new technology" that got them 1.5GHz( 2x 750MHz ) and it was found out later that only PART of the CPU was running at 2x. This all happened when AMD beat Intel passed the 1GHz barrier. Are they again playing "tricks" to get a big GHz label on their parts?
So any of you people up on this dual-core and hyperthreading thing and feel like explaining to the rest of us what's going on? TIA.
LoB
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
VMS went 64-bit at least a decade ago.
Great OS for English-speaking folk, despite Linus's hatred for it.
I cant wait for Zalman to make a gigantic copper spread for this generation of CPU's. This could be the end of the acrylic window cases, since the only thing visible will be copper fins and pipes.
If you think
But will probably require a heat sink that is twice as large - or perhaps just use the entire side of the case.
You know, if AMD would actually give money to game developers to multi-thread Unreal Tournament 2005, Doom 4, and Jedi Knight 4, they would see their sales climb through the roof. Gamers would be buying dual-core systems (and Opteron/MP/modded-XP), and loading up with Geforce6800's and X800s.
A couple million to the Grand Theft Auto: San Andreas team, and maybe a little blurb in the readme.txt about how well the game works on multi-processor AMD systems, and people would take notice.
Just a little FYI...
1 slot = 2 CPUs .
2 slots = 4 CPUs
4 slots = 16 CPUs
. .
Someone flips on a Beowulf cluster started this way and it's Game Over man:
Neo@Beowulf>./Whoah
*cluster flies off*
It's not entirely true that single is better. It depends on what the system is used for. If both cores are accessing the same memory (likely the case in a multi-threaded webserver for instance), then they can benefit from sharing a cache and effectively doubling the cache size. However, if both cores are accessing different memory (almost any situation where different applications are running on the different cores), then sharing a cache could have devastating effects on performance. As each process running on each of the cores would be likely to be evicting the other processes cached memory, there would be a plethora of cache misses. In the worst case, this could effective make the system as slow as if there were no cache at all. In the average case there would likely be a significant performance hit. A better strategy than unified or seperate caches would be to have a read/write cache for each core and allow each core to read the other core's cache. This would allow the benefits of the shared cache in the case where both cores were accessing the same memory without having the major performance hit when each process is accessing different memory. Unfortunately the hardware for this would be even more complicated than for the unified or seperate cache techniques.
Is it going to hurt Microsoft that they aren't releasing a real 64-bit operating system for another year and a half?
I tend to see the possibility that people who buy 64-bit computers will try to take advantage of their capabilities by choosing 64-bit capable operating systems to run on them. Even Itanium users would have little or no interest in running a beta version of Windows when they could have a real, tested, released operating system like Linux for cheaper. This may potentially even help Linux with hardware support, and encourage 64bit vendors to use more capable hardware.
On the other hand the 80386 processor (the first to have 32-bit capabilities) was released in 1985. It wasn't until 1992, 7 years later, that Microsoft came along with an OS (Windows NT) that took advantage of the 80386's 32-bit architecture. It was 3 more years after that before Windows 95 brought those features to the consumer market, which Microsoft promptly dominated.
Will Microsoft's delay in releasing a 64bit operating system hurt them? Will it make a difference?
nope, read the spec sheets on the projected AMD cpus - 90-100 watts maximum, and that means their high end stuff. The slower chips will be around 80 I bet. You're forgetting that these chips will be on the 90nm process at first, and then as speeds ramp, you'll see them on the 65nm process by 2006. That simply means no more heat than a single P4 puts out right now. AMD chips are currently very cool compared to Intels. Ofcourse, the clock speed has a lot to do with that. SOI is the other part.
2 - Dual Double Rate Dynamic Processors
hence the block of RAM per CPU.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Yeah I was right. The only reason I might be an idiot is because I didn't read comments posted above mine.
While dual cores on a chip might be nice, it won't produce any serious performance increases.
The underlying problem with Intel and AMD's processors is that they are at the mercy of the architecture:
The ironic thing is that even though AMD and Intel are out-clocking mainframe processors by factors of 2 and 3, mainframes still get more work done simply because they aren't choked by a slow and overcrowded system bus .
The society for a thought-free internet welcomes you.
Are the dual cores on the same piece of silicon? This would require both cores to be defect free. If only one core is defect free, is it possible to disable the dud and sell it as a single core CPU? This would make it a much more attractive proposition for the manufacturers.
E.g. if a single core has a yeild (probability of being defect free) of 80%, then the dual core chips will have a yeild of 0.8^2 = 64%. (Actually slightly lower, because whatever interconnect they have also has to be free of defects.) 64% will have two good cores, 4% will have two bad cores, the remaining 32% will have one good core. The manufacturer would obviously like to make use of that 32% if they can.
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
For goodness sake, stop it!! Slashdot posts are not meant to take wit and humor above the hot grits baseline! :-)
From what I know of the current architectures, AMD's solution to main memory access woes (point-to-point bus) seems more sane as soon as more than a couple of processors are installed in the system. Shared bus (as in Intel's solution) seems to require huge caches to operate efficiently, and as we all know, Pentium 4 really does not like pipeline stalls or branch mispredictions.
Let's take a hypothetical example: quad processor systems utilising dual core processors from Intel and AMD.
AMD: each processor (core) talks directly to its local memory block, and via HT links to adjacent processors' memories. Processors do not have to contest for access to the bus and thus memory access is always low-latency, even when accessing remote memory. If built today, HT links would operate at 1 GHz.
Intel: processors share the same bus with each other and memory controller. Any time a processor needs to access memory, it has to wait until the bus frees to ask the memory controller access to main memory. Pipeline stalls happen here if bus is not free when needed. This is compensated with huge L3 caches. As far as I know, current quad processor systems from Intel have bus speeds of around 533 MHz.
So in a nutshell, Intel competes with AMD on a quite level field when the system has 1-2 processors, but as soon as processor count goes up, bus bandwidth becomes an issue with Intel. It shall be interesting to see how Intel attempts to counter that.
What I am getting at with this? Well, those huge 12 MB L3 caches in Intel's future processors sure aren't cheap. They take up lots of silicon and WILL decrease core yields since they've got lots and lots of points of failure. So manufacturing processes really have to be ramped up to allow that at reasonable cost.
"Intellectual Property" should be an affront to anyone capable of independent thought.
...will Apple's G5 every catch up with the tricks Intel and AMD have up their sleeves? Just when Apple got to be nearly as fast as Intel and a little behind AMD, this will leave them out in the cold.
Best Buy can have you arrested
It's not a question of if there will be 64-bit OS's to go with these things. Eventually, it's sure to happen in multiple flavors.
The real question is what ELSE will be on the motherboards and in the chip by the time these things hit the market? Specifically, what DRM hardware will come with these things? What will the BIOS look like?
That's why I think that the current generation of 64-bit desktops are probably one of the best values for a machine you might be using 4 years from now. It's risky to wait 6 months or a year with the current views of the US Congress and FCC. This generation of 64-bit machines might be one of the last to be multi-purpose Turing/Von Neumann devices.
Don't wait for dual-cores if you have the cash and want to be the one in control of your 64-bit machine. Eventually the OS's will catch up.
"Let him go, Ralph. He knows what he's doing." --Otto Mann (simpsons)
last time i checked the opteron lines had the memory controller on the chip hence the lightning fast fsb support at almost native clockrates..
In the worst case, this could effective make the system as slow as if there were no cache at all.
...;
Or worse... this can be realized on a single CPU system even. Just have a number of variables that all map to the same cache tag but the number of variables are many times the X-way associativity of the cache. Each time an access happens, one of the other lines will have to be flushed. If they are only read, it isn't a big deal, but if there's a write each time, then it can be quite painful.
The easy way to do this is to have a number of large arrays that are powers of two in size and each array is as large as the L2 cache. The number of arrays you use should be a multiple of the X-way of the cache (for an 8-way set associative cache, choose 16 arrays for example). Now, do calculations based on entries in each array.
for (int i = 0; i ARRAY_SIZE; i++)
Array1[i] = Array1[i] + Array2[i] +
Absolute cache destruction.
Would it be possible at such small levels to make the processor interconnect very very fast and very low latancy? I'm sure HT (Hyper Transport) is designed as a very robust error checking protocol, surely it must be possible on this scale to get 3ghz or more out of what is meant to be a Chip to Chip interconnect. Also, what are the problems with allowing either processor to access memory as and when it is needed? I.E. Wiring the memory controller of both processors to the pins and making a realtime decision based upon which core has access to memory? This would reduce latency a lot for the second core.
Let's walk through the sentence slowly and try to figure out what the Idiot was trying to say.
"I bought my girlfriend has her own dual cache for her dual-core chips last christmas!"
First of all, I think he meant to say "I bought my girlfriend her own dual cache for her dual-core chips last christmas!"
Ok. Now, first we have to identify what "her dual-core chips" are. In my years of Idiot translating experience, I am confident that he means "her boobs". That leaves us with "dual cache". Since a cache is used to temporarily store something, and we are talking about boobs, I think "dual cache" is "bra". So, we are left with "I bought my girlfriend her own bra for her boobs last christmas!"
However, we have double pluralities here. "chips" implies more than one chip, and "dual" implies two. Therefore, I think we have to modify our translation. Since "dual boobs" doesn't make sense, I think that "chip" really means "nipple". So, accounting for this in our new translation, we are left with "I bought my girlfriend her own bra for her two-nippled boobs last christmas!"
Or, in more proper English, "Last Christmas I bought a bra for my girlfriend, who has two nipples on each breast."
You are entirely correct that you can craft code to negate the usefulness of the cache on any system. For the average application, however, the cache is extremely effective and this special case does not arise. If it were to arise accidently, it would not be difficult to rewrite the code in a cache-friendly manner.
The interesting difference between the single core and the multiple core sharing a unified cache is that two memory intensive cache-friendly programs could trample each other's cache and result in "Absolute cache destruction". The result could be that the two programs running on seperate cores and sharing a cache would run slowing than the programs taking turns on a single core as every instruction on the single core might be a cache hit(data is in the cache), but when moved to the dual-core with unified cache could become a cache miss(data is not in the cache). This would me a huge performance hit because memory access is orders of magnitude slower than accessing the cache (resulting in worse performance on the dual-core system).
What do you think the chances are that these dual core chips will be made available for s939 motherboards? :/ I'm definitely upgrading at the end of this year (939, AMD64, DDR2, yadda yadda) no matter what, but am dying to get my hands on this without having to upgrade to an entirely new platform (939 will only be ~1 year old when this arrives)!
Why can't they just make run of the mill CPUs at least dual-processor capable, and not force us to buy their upmarket SMP CPUs?
Was there ever a real technical reason behind this, or was it purely marketing? PII/PIII could go dual, but now it's Xeon land and way more expensive motherboards to boot. I don't remember dual P2/P3 motherboards being a whole lot more expensive than their single CPU counterparts.
Dual Caches for Dual-core Chips
Ahha... I misread that as "Dual Crashes..." thinking Windows found a way of really boning up a system when running on dual processors/cores.
It's not just 64 bits that we're talking about here- it's a larger register pool with the AMD64 architechture. This WILL affect many things end-users do.
:-)
(Not to mention that an Athlon64 in 32-bit mode seems to stack up rather nicely against the P4 clocked half again faster... Go figure...
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
I thought the future was all about NUMA and process afinity (keeping processes "affine" to a specific CPU and memory pool).
There are some consumer motherboards that are NUMA right now, with opterons...each CPU has it's own memory.
Setting the process affinity keeps the scheduler from ping-ponging the processes across the CPUs.
...will both have two processor cores, the actual unit inside a processor that performs the calculations...
Oh, so that's what a processor does! Can you remind me again what "RAM" is?
This isn't the first time I made that assertion only to be corrected in this same vein.
(hits self)
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Suggest you look more into AMD's implementation of separate caches. Assuming the two cores with independent caches work more or less as two separate K8 CPUs do now, the two parts actually can snoop each other's caches before reaching for main memory-- from what I've read, the net effect of having two 1MB caches on separate cores should approach the performance you would see from two 2MB cache CPUs without this capability.
d _Architecture_of_AMDs_64bit_Core.html#3.18 for more details than I'm qualified to rehash (I have a CS and computer architecture degree, but it's from 15 years ago and I'm not that active in the biz these days... I can follow the discussion, more or less).
See http://chip-architect.com/news/2003_09_21_Detaile
AMD has patents that make this possible using Hypertransport across CPUs, but I'm not sure how that relates to how they'd do it within a dual-core CPU, and I'm not sure that they'd rule out all the ways Intel could achieve a similar result.
I can only speak for myself. i bought a HP AMD64 3400+ computer at CompUSA with my hard earned $949 bucks and fired it up as soon as I could get it home. Was I disappointed, It didn't seem to be any faster than my 750mz pentium and i was swamped with popup messages urging me to buy this and that to make the menu buttons actually work or else warning me that my trial software would expire in xy days unless I bought the full version. Fdisk -all got rid of the salesmen. I installed beta WindowsXP 64 bit and could not get virus software or drivers for my scanner, sound card, etc. I put Fedora Core2 x86_64 on the second partition. Absolutely everything worked, even my 8in1 card reader. Now my box screams, I can compile a kernel faster than I can pop and eat a bag of microwave popcorn. Nevermind the fact that the 64bit arch. gives me enough highway to run the Gimp while the kernel is compiling on desktop1. Me wait on Microsoft? Nahhh.. I'm on 64bit now and ain't lookin' back and I have not bothered to dual boot XP just to see it slow down my 64bit box. Maybe Longhorn.....someday....showme like they say in Missouri.
Practically, a dual core with shared cache will probably have less cache than a dual core with two caches, and thus will be slower, but there is no reason why the cache controller has to be designed in such a way that two memory intensive programs can trample each others' cache elements.
I appear to have a blog. Odd.
You mean, like this guy ? BlueGene
I have programmed (both assembly and higher level) back in the 16 bit days (and in the 8 bit days for that matter), and the problem was entirely different.
The big problem wasn't just having to address more than 64k of data, it was having to address _one_ _chunk_ of more than 64k. (E.g., a 640x480 pixel bitmap was already over the limit, even in 4 bit colour.)
Having 50,000 chunks of 10-30k each wouldn't really even start to be a bother. You'd just load the segment register at the beginning
What made one large chunk be special was that you had to do segment arithmetic in the middle of addressing its contents.
E.g., if you wanted to apply a gauss blurr (or any other a matrix filter) to a bitmap, having to compute the segment and offset for each pixel was a huge performance hit. If you applied something as trivial as a 3x3 filter to a 640x480 bitmap, you'd end up doing segment arithmetic 9x640x480 = 2,764,800 times in the process, instead of just adding/subtracting a constant value to an int.
Of course, you could and did optimize it better than that. (E.g., it's trivial to reduce that to computing the segment/offset only once per row. I.e., only 480 times.) But that was something you had to do. That was the kludge and extra work of those times.
Frankly, I don't think we have the same problem nowadays. To have the same problem with the 4 GB limit of 32 bit addressing, you'd need one 4 GB chunk of allocated memory, which you can't possibly break into smaller chunks.
E.g., if you process a DVD movie (as per your example), does it really need to be a single chunk? Well, no. It's divided into frames, which are a lot smaller than that. You also don't have to hold the whole movie in RAM, and inddeed you don't, since few people have 4 GB RAM on their computers even at work.
Even on the server side, I'd wager a guess that 99% of server side stuff doesn't allocate over 4 GB to a single process. (None of our application servers do, and we're talking a rather big corporation.) And even if someone did, I doubt they'd get 4 GB as a single malloc().
So again, it's nowhere near being as big a problem as the old 16 bit problem.
Don't get me wrong, I do have an Athlon 64 and they're nice chips anyway. The extra registers and high IPC are reasons enough to have one anyway.
But the whole "we need 64 bit now!!!" is IMHO just marketting hype and bullshit. The majority of computers need 64 bit registers like fish needs a bycicle.
A polar bear is a cartesian bear after a coordinate transform.
and if we still don't pay attention, will they announce again this announcement about their last announcement?
Another thing to remember is that, using 64bit in user-land can actually slow down some applications. You're pushing around twice as much data for every single operation. I have a 64bit ULTRASparc running Linux (Aurora). The OS is compiled in 64bit, but most of the applications are still running in 32bit mode. Many applications break when you compile them in 64bit mode. Others work, but slower. Don't think I've seen any actually run faster. Unfortunately, I don't have anywhere near 2Gig RAM on that box, so I can't really take advantage of the addressing space extensions.
Your Servant, B. Baggins
Do you have any references on this? It sounds counter-intuitive to me. I would have thought that as long as both cores were accessing constantly accessing memory, the cache would be effectively split between them in a roughly 50/50 split. Actually, if one process was using more memory than the other, the split might end up being proportional, I think -- and this ought to improve performance, as it effectively means that the cache size (which in total is presumably twice as large as the unshared caches would be) is split in an adaptive fashion between the two processes.
Now, I'm not an expert on cache behaviour or memory access patterns, so I'll accept I could easily be wrong -- but I think the way I see at as the more likely scenario.
Actually the FSB is the bottleneck ALMOST ALL THE TIME. It may only be 1-2% of the instructions, but a ram-load takes hundreds or thousands of CPU cycles. That's the very reason for speculative loads on the itanium, to start the load as far in advance as you can. Modern processor architectures are built around trying to minimize the necessity of RAM-loads. This is, or course, a problem of latency and not of bandwidth, though you need that too.
That said, Mainframes don't have any real sollution to the latency problem either. (except vector CPUs from NEC or Cray, but that only works for a very limited set of programs)
In x86 land the 486 was also the first to have a cache. (8kb)
Obviously you want dual caches. Even if a single shared cache would be a benefit with a slightly higher cache hit rate, you have two 3GHz plus cores bashing at it at > 3 billion times a second. In that case you don't want to deal with the cores tying up each other for ALL cache-accesses that they do.
:-(
It is certainly worth it, to push the point where memory hierachies meet as far as possible from the core. That's why you get big caches in the XEONs meant for multiprocessing.
Now, if a dual (triple?) ported cache were "free", then of course you'd go for the dual ported version: double the cache size, and your hit ratio goes up a little. But they are not free. You pay in speed, area, transistor count etc. In the end, I'm pretty sure Intel and AMD will have evaluated the tradeoffs and taken the right decision.
With a bit of luck, they have a "fastpath" between the caches for stuff that needs to bounce around between the two caches/cores.
Oh. I haven't read the article. Sorry.
I . . . Who took the money?
Who took the money away?
I . . . It's always showtime
Here at the edge of the stage
I, I, I, wake up and wonder
What was the place, what was the name?
We wanna wait, but here we go again...
I . . . takes over slowly
But doesn't last very long
I . . . no need to worry
Evr'ything's under control
O - U - T But no hard feelings
What do you know? Take you away
We're being taken for a ride again
I got a girlfriend that's better than that
She has the smoke in her eyes
She's moving up, going right through my house
She's ginna give me surprise
Better than this, know that It's right
I think you can if you like
I git a girlfriend with bows in her hair
And nothing is better than that
Down, down in the basement
We hear the sound of machines
I, I, I'm driving in circles
Come to my senses sometimes
Why, why, why, why start it over?
Nothing was lost, everthing's free
I don't care how impossible it seems
Somebody calls you but you cannot hear
Get closer to be far away
Only one look Maybe that's all that it takes
that's all that we need
All that it takes, all that it takes
All that it takes, all that it takes
I got a girlfriend that's betther than that
And she goes wherever she likes. (there she goes...)
I got a girlfriend that's better than that
Now everyone's getting involved
She's moving up going right through my heart
We might not ever get caught
Going right through (try to stay cool) going through, staying cool
I got a girlfriend that's better than that
And nothing is better than you
I got a girlfriend thats better that this
And you don't remember at all
As we get older and stop making sense
You won't find her waiting long
Stop making sense, stop making sense...stop making sense, making sense
I got a girlfriend that's better than that
And nothing is better that this
( is it? )
I hate Liberals and Conservatives.
If you are a Liberal or a Conservative, then HAVE A NICE DAY!
Courage.