Slashdot Mirror


Dual Caches for Dual-core Chips

DominoTree writes "The dual-core chips that AMD and Intel plan to bring to market next year won't be sharing their memories. A version of Opteron coming in 2005 and Montecito, a future member of Intel's Itanium family also slated for next year, will both have two processor cores, the actual unit inside a processor that performs the calculations, and each core will have separate caches."

342 comments

  1. mmmm cores by zaqattack911 · · Score: 3, Insightful

    Can I have a 64bit OS too please? (no not linux)

    1. Re:mmmm cores by Anonymous Coward · · Score: 2, Informative

      Here you go. Works on dual-core, seperate cache chips already. (HP PA-8800)

    2. Re:mmmm cores by bburton · · Score: 5, Funny

      Can I have a 64bit OS too please? (no not linux)

      Didn't you hear? According to SCO, Linux doesn't even exist!

      --
      Slashdot = ((Technology + Politics) / Trolls) % Grammar Nazis
    3. Re:mmmm cores by Anonymous Coward · · Score: 0

      not linux how about solaris then?

    4. Re:mmmm cores by EvilTwinSkippy · · Score: 3, Informative
      OS X, or if you hate Apple, NetBSD.

      Solaris.

      The Playstation 2 is actually 128 bit. But that doesn't really count as an OS...

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    5. Re:mmmm cores by Lullabye_Muse · · Score: 1

      Hey so was the dreamcast :( .

    6. Re:mmmm cores by Anonymous Coward · · Score: 0

      I don't think the current version of OS X is true 64 bit, but the next one will be: http://www.apple.com/macosx/tiger/64bit.html

    7. Re:mmmm cores by iNiTiUM · · Score: 5, Informative

      Sure you can
      Oh you want one for the AMD64?
      How about these?

      --
      When encryption is outlawed, ou++1!@(93j++js-d9298yIUH(*Y24JKB!~
    8. Re:mmmm cores by DrLZRDMN · · Score: 1

      http://www.us.playstation.com/peripherals.aspx?id= SCPH-97047

    9. Re:mmmm cores by kennedy · · Score: 4, Informative

      wrong. the ps2 has a 64bit MIPS cpu with *128bit extentions*. Think MMX or SSE.

    10. Re:mmmm cores by yamla · · Score: 3, Interesting

      Apple isn't scheduled to release the first 64-bit version of OS X until the first half of next year and even then, it is not guaranteed to be fully 64-bit (though this is what most people, including me, believe).

      --

      Oceania has always been at war with Eastasia.
    11. Re:mmmm cores by puddpunk · · Score: 2, Interesting

      Can I have a 64bit OS too please? (no not linux)

      Why not Linux? Most 64-bit ready OS's these days are Linux (SUSE 9.1, FC2, Gentoo) or Unix-ey (MacOS X).

      So it's pretty much tough shit for you then. Microsoft has abandoned you, their 64-bit OS will not be out until late 2005 (but you can have their crummy beta for free). Bahahahaha.

    12. Re:mmmm cores by DrEldarion · · Score: 1

      ... because OSX will run on AMD or Intel chips?

    13. Re:mmmm cores by iNiTiUM · · Score: 1

      Close, but the PA-8800 is still technically shared.

      --
      When encryption is outlawed, ou++1!@(93j++js-d9298yIUH(*Y24JKB!~
    14. Re:mmmm cores by wolenczak · · Score: 1

      What would be the difference between a 32bit vs a 64bit MS Windows OS??? I think it's all hype

      As end user I already have partitions as large as several TB's, files as big as DVD's, gigabytes of RAM, ultra fast internal buses, high speed network, high speed hard drives. All that to play DoomIII, type in Word, browse the internet, play music....

      I only see a 64bit windows based operating system being used as a server OS, and in that case, i certainly woudn't be using windows, i'd be using sparc/solaris or any other unix flavor.

      Maybe in 10 years when we get to play DoomIV and a holographic screen would be needed I could understand using 64bit operating systems for end users.

      IMHO.

    15. Re:mmmm cores by Tanktalus · · Score: 1

      All this talk about a 64-bit Apple core is making me hungry.

      How many bits does it take to get to the core of an Apple?

    16. Re:mmmm cores by Anonymous Coward · · Score: 0

      sounds like you lack even a basic understanding of processor architecture and its software interface. Pls fx thx.

    17. Re:mmmm cores by kabloom · · Score: 0

      Would you be willing to explain his flaws, rather than just shoot him down? I haven't spent that much time thinking about what the relative advantages/disadvantages would be except in very high memory situations, and scientific situations (where a larger word size could store much larger values).

    18. Re:mmmm cores by NoMercy · · Score: 1

      What you mean is, can you have 64bit Windows... since most other operating systems have had a 64bit version for a long time. HP-UX, Solaris, Linux, MacOS X,...

    19. Re:mmmm cores by LWATCDR · · Score: 1

      NetBSD, Solaris, Mac OS/X... What you mean is can you have a 64Bit version of Windows and the answer is sure a beta version.
      Windows is just one of many OSs and in this case it is lagging behind.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    20. Re:mmmm cores by Anonymous Coward · · Score: 0

      No, I'm mostly just here to criticize. Information on the topic is widely available. Go google for it.

      Well ok... Here's a quickie to get you started. When you have 32 bits available per instruction, you can usually pass in an operation and maybe some value to use in that instruction. Other 32 bit instructions require multiple clock cycles because you have to send multiple parameters for a single operation. If you have twice the bits to describe what operation you want to perform, you'll be able to perform many of these formerly multiple cycle operations in a single clock cycle. This can speed EVERYTHING up.

    21. Re:mmmm cores by shawnce · · Score: 4, Informative

      Pulling in a post of mine from a completely different forum...

      The G5 is a 64 bit processor and OSX Panther is a 64 bit OS. :)

      Panther is not a true 64 bit OS in the traditional sense of the word. It does not support 64 bit addressing[1]. It does however support the use of 64 bit math operations and the saving of related registers on the CPU.

      Tiger (Mac OS 10.4) will have the first steps towards a true 64 bit OS by allowing 64 bit addressing (virtual addressing) to be used for libSystem only based tools (command line applications, no GUIs, etc.). At least that is all that Apple has so far committed to doing in Tiger at this time (cannot say more because of NDA).

      [1] Note the Panther kernel has support for 64 bit physical addressing so the system can utilize greater then 4 GBs of RAM (hardware wise supporting up to 16 GB of RAM) but it does not support 64 bit virtual addressing (what applications use) at this time.

    22. Re:mmmm cores by Anonymous Coward · · Score: 0

      Don't worry. With SCO Unix (and Windows XP), you're not missing out.

      You too can have 'Dual Crashes for Dual-core Chips'. ;-)

    23. Re:mmmm cores by cfuse · · Score: 3, Interesting
      Didn't you hear? According to SCO, Linux doesn't even exist!

      No doubt a dual core processor will incur a dual cpu license fee as well.

    24. Re:mmmm cores by SonicBurst · · Score: 1

      Windows has had production 64-bit OSes since Windows 2000, only they called it the DataCenter version, and you had to get it with your hardware from an OEM. So yes, you can have 64 bit Windows, though you'll pay dearly for it.

      --

      Geek used to be a four letter word. Now it's a six-figure one.
    25. Re:mmmm cores by Hoser+McMoose · · Score: 4, Interesting

      With a 32-bit OS and 32-bit applications you can only access a maximum of 2 or 3GB of data at a time (possibly even less due to memory fragmentation). This may or may not affect what you do.

      If you do indeed have files as big as DVDs, it would certainly help with editing those files. You CAN break those up into chunks, only having 2GB or less in memory at any given time, and for the most part this works ok, however it does tend to be a bit of a kludge at the best of times, and sometimes it just flat out doesn't work.

      As you correctly guess, servers are the first situation where this really makes sense. If you've got a database that is more than 2GB in size, you REALLY want a 64-bit system, otherwise you'll tend to take a big performance hit. Many high-end workstations require 64-bit systems as well to process all the data.

      So, where is the benefit for the end-user? Well that depends on the user. First off, having more than 2GB of physcial memory on a 32-bit processor requires some really ugly hacks to make things work. They do work, but it is a really dumb idea. It was a annoying and crappy when we were forced to do it back in the 16-bit days, and it hasn't gotten any better. Secondly people are using bigger and bigger data files on their home PC, editing larger pictures and videos, playing games with more graphics and sound, some even run into issues with types of databases (I know my Usenet newsreader sometimes craps out when I'm downloading too much pr0n because of database limits). Basically you might not need it, but someone else might. The best part about it though is that 64-bits is "free".

      Basically you've got a 64-bit CPU that is no more expensive than competiting 32-bit chips and Microsoft has said that 64-bit WinXP Pro will sell for the same price as 32-bit WinXP Pro, so really the question is not so much "Why" do we need 64-bit, but "why not?"

    26. Re:mmmm cores by Anonymous Coward · · Score: 0

      You (they?) are wrong. The PA-8800 has split L1 caches, and an OPTIONAL, unified, 32MB L2 cache. You _can_ buy PA-8800s without the L2 cache.

    27. Re:mmmm cores by Neuroelectronic · · Score: 1

      ..or AMD 64

    28. Re:mmmm cores by Short+Circuit · · Score: 1

      For most people, the advantages aren't here yet. The biggest advantages are those that were included in the architecture specicifications. Things like more registers and more physical and virtual address space.

      Those would be useful even without 64-bit instructions. By including them, 64-bit instructions will be available at the point when people need them.

    29. Re:mmmm cores by DominoTree · · Score: 1

      http://www.apple.com/osx

      'nuff said.

      hey you didn't specify x86, and there have been dual-core ppcs for a while now (although none in macs)

    30. Re:mmmm cores by Anonymous Coward · · Score: 0

      Feh. What OS matters other than Linux and the *BSDs? I surely hope you aren't talking about Windows. If so you will be waiting a while for a production 64 bit version, and a lot longer for any significant number of applications that take advantage of it.

    31. Re:mmmm cores by Anonymous Coward · · Score: 0

      Can I have a 64bit OS too please? (no not linux)

      Could somebody burn this infidel for me? thanks.

    32. Re:mmmm cores by Anonymous Coward · · Score: 0

      Solaris 10 will support amd64 architecture machines. We're debugging this now. 32 bit apps Solaris x86 apps do of course just work along side 64 bit apps, just like Solaris for UltraSPARC supports both. - Bart http://blogs.sun.com/barts

    33. Re:mmmm cores by Paladin128 · · Score: 1
      • Windows has had production 64-bit OSes since Windows 2000, only they called it the DataCenter version, and you had to get it with your hardware from an OEM. So yes, you can have 64 bit Windows, though you'll pay dearly for it.
      Not for AMD64/x86-64/EMT64 -- generally available for only Itanium and MIPS (I think -- not too sure)
      --
      Lex orandi, lex credendi.
    34. Re:mmmm cores by e2ka · · Score: 1
      1] Note the Panther kernel has support for 64 bit physical addressing so the system can utilize greater then 4 GBs of RAM (hardware wise supporting up to 16 GB of RAM) but it does not support 64 bit virtual addressing (what applications use) at this time.

      Really? So the kernel can use 16 GB of RAM but the applications can't? Is there ever a situation where this is useful? When my mac boots up, the OS+whatever else is only using ~128/1024 MB of RAM. This is an honest question of what can possibly use "kernel" memory, if it is not an application!

    35. Re:mmmm cores by Anonymous Coward · · Score: 0

      It isn't healthy to eat the cores. Too much prussic acid.

    36. Re:mmmm cores by Anonymous Coward · · Score: 0

      new era of schyzophrenic computing will come soon.

    37. Re:mmmm cores by pohl · · Score: 1

      A quick obvious example would be having four instances (developement, staging, qa, production) of a very hungry application server running at the same time (weblogic, jboss, etc) each with a 4GB heap, none of them should ever be paged-out at any given time on hardware that had 16GB of real memory. You could generalize this, of course, to any n hungry applications (apps that edit digital video come to mind, as do relational databases). It's actually very rare, in practice, for someone to want to devote all of real memory to a single application, in my experience, so this first step into the 64 bit world was a wise move on Apple's part.

      --

      The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...

    38. Re:mmmm cores by julesh · · Score: 1

      First off, having more than 2GB of physcial memory on a 32-bit processor requires some really ugly hacks to make things work.

      Actually:

      1. Full memory on a 32 bit machine is 4GB, not 2. You don't need to do any hacks to break this barrier.

      2. If your architecture supports it, you can have processes using a 4GB paged section of a larger physical memory. This isn't really a hack, it works pretty flawlessly and allows a single computer to deal with more than 4GB of data, just not a single process. IA32 processors have had this ability since (I believe) the Pentium III (not sure when AMD implemented it).

      3. If all you're doing with the memory is cacheing data (which is what most applications using that much memory are doing with it), you can use the paging facilities to implement a "sliding window" into a larger buffer. This slightly complicates access to the data, slowing it down a little, but it works and is easy enough to implement.

      I think it'll be about 5 years before most end users would benefit from a 64 bit system, maybe even a little longer.

    39. Re:mmmm cores by shawnce · · Score: 1

      So the kernel can use 16 GB of RAM but the applications can't?

      Well yes and no (my wording wasn't the clearest). The virtual memory system can manage 64 bits worth of physical memory (in current G5s the system can support up to 16GB of physical RAM however Apple only support 8GB at the moment given DIMM densities). The kernel itself isn't able to use all of that memory directly itself since it is still using 32b address and will remain 32b for the known future.

      Is there ever a situation where this is useful?

      Yes.

      For one each process, including the kernel itself, has its own virtual memory space that can grow up to 4GB in size (32 bit addressing). So say you have 4 processes each with a 2GB working set (they use 2GB of RAM at a time), 4 * 2 = 8 GB of RAM. Since the VM system (an aspect of the kernel) can manage that much RAM all of those processes will be able to get the physical RAM they need and avoid swapping.

      Additionally on Mac OS X the VM system shares its page pool with the file caching aspect of IO sub-system, this is called the Universal Buffer Cache (UBC). The UBC intentionally will cache file data read in, as of yet unused, pages of physical memory. This has the effect (given enough time or file data loading) of using all free physical memory in the system to cache file data. Doing this can greatly reduce load times of application or files that have been cached in by the UBC (memory access is many many times fast then disk access).

      Now if a process or the UBC needs a physical page and no more free pages are available then (in general) the least recently used cache page is reused (the cache file data is forgotten, no page out need because it is only caching data not changed data). This is as fast as using a free page.

      Anyway between the ability to run more memory hungry applications before swapping (assuming you have larger amounts of physical RAM installed) at once and the ability of the UBC to leverage all available free RAM you can get a lot of gain out of the this capability of Mac OS X's VM system without the overhead of having to go all 64 bit.

      In fact using mapping tricks and the behavior of the UBC one can simulate an application with an address space larger then 32 bits in size (using a sliding window of mapped in memory from a map file for example).

    40. Re:mmmm cores by SonicBurst · · Score: 1

      I didn't say anything about those architectures. And you're right, I don't think MIPS made the cut.

      --

      Geek used to be a four letter word. Now it's a six-figure one.
    41. Re:mmmm cores by Anonymous Coward · · Score: 0

      Solaris?
      Windows 64 bit?
      What do you want?

    42. Re:mmmm cores by Jozer99 · · Score: 1

      Mac OS X is not 64 bit, did you not hear? It just runs on G5s in 32 bit mode, with 64 bit memory addressing somewhat enabled. Apple has announced that there will not be a native 64 bit OS for several more years. Longhorn is "supposed" to be 64 bit native. It was also supposed to have new security functions, revolutionize GUIs, have the first practical database based filesystem, and dust your furniture while you are out of the house. It was also supposed to be out in 2003... ;)

    43. Re:mmmm cores by th4tGuy() · · Score: 1

      Very Interesting. I have a couple G5s with 8GB of ram each. They are running some large dbs. This now make sense why I see only 90MBs free in top and VM taking up so much RAM, yet the db process only takes ~ 3GB max.

      I wonder if the page in / out counter only counts pages which weren't RAM cached?...

      Thanks!

      --
      -- As soon as I have an interesting sig, you'll be among the first to know!
    44. Re:mmmm cores by shawnce · · Score: 1

      I wonder if the page in / out counter only counts pages which weren't RAM cached?...

      Yeah those only count pages faults to and from disk as I understand it and cache pages don't need to be faulted to disk.

      On Mac OS X consider "free" + "inactive" to be your total available memory (cache pages are counted as part of inactive).

    45. Re:mmmm cores by Fancia · · Score: 1
      Hey so was the dreamcast :( .
      No, it wasn't. It uses a 64-bit Hitachi SH-4.
      --

      Bít, zabít, jen proto, ze su liska!
    46. Re:mmmm cores by iNiTiUM · · Score: 1

      Cripes...as a owner of a O2...I can't believe I overlooked them!

      --
      When encryption is outlawed, ou++1!@(93j++js-d9298yIUH(*Y24JKB!~
  2. Note: Here, Single is Better by Anonymous Coward · · Score: 5, Informative

    In case it's not obvious to those who didn't read the article all the way through, it's a better thing when the memory is shared (single cache) rather than separate (dual cache). But that is harder to design, so for these first-generation dual-core chips from Intel and AMD, they are using separate caches for each core. (IBM's dual core Power4 processor has a unified cache.) At some point down the road, they will likely unify them to increase performance.

    1. Re:Note: Here, Single is Better by mothz · · Score: 2, Funny

      At some point down the road, they will likely unify them to increase performance.

      In the meantime, they should just put a bright red sticker on the box that says "DUAL CACHE!" It is documented, so it's a feature, not a bug.

    2. Re:Note: Here, Single is Better by skribble · · Score: 4, Funny

      Thanks for pointing that out, I'm sure a number of people were things "Ooooo Cool two caches" when they should have been thinking "Awwww Damn, two caches!"

      --
      --- Nothing To See Here ---
    3. Re:Note: Here, Single is Better by mrchaotica · · Score: 4, Interesting

      Hmm... the Power4 is dual-core and unified cache? I wonder if this has implications for future Macs to compete with these new x86 processors...

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    4. Re:Note: Here, Single is Better by EvilTwinSkippy · · Score: 4, Interesting
      Compete? What part of spank them and stole their lunch money does x86 fail to understand.

      We have a dual p4 server, the damn thing sounds like a gas turbine when it's on. Really, I've used quieter air compressors.

      Our dual-G5s from apple are quiet, sleek, and each processor gets it's own block of RAM. Granted, the ASIC for the memory controller gets it's own heat sink. But man, you crack it open and you wonder where the rest of the server is. It's literally 2 giant blocks for the processors, the ASIC that handles memory management, and a wee little chip on the end of the mobo that looks like a bus controller.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    5. Re:Note: Here, Single is Better by goofy183 · · Score: 1

      Wish I had some mod points to bump this up higher :-)

    6. Re:Note: Here, Single is Better by Rufus88 · · Score: 1

      Are there situations where two caches might be better? For example, a multi-threaded application with two memory-intensive threads, each locked down onto a specific CPU?

    7. Re:Note: Here, Single is Better by poot_rootbeer · · Score: 1

      What does IBM's Power4 chip have anything do with Macs?

      Even the G5 PowerPC chips only implement a fraction of the full POWER architecture. I wouldn't expect to see dual-core/single-cache CPUs in Apple Desktops any time soon. Maybe in 8 or 10 years...

    8. Re:Note: Here, Single is Better by spuzzzzzzz · · Score: 5, Interesting

      The dual cache simplifies things emormously, especially taking the design of the Opteron into account. Opterons are incredibly scalable--each one has three HyperTransport links that can be connected to memory, I/O or another processor. In order to make dual-core chips, all AMD has to do is take two Opterons, put them in the same package and hard-wire a HT link from one processor to the other.

      Of course, they also need to worry about things like size and power consumption but the simplified architecture really makes things a lot easier and will probably contribute to lower prices. It will also accelerate the introduction of multi-core (ie more than two) processors...

      If they were to implement a unified cache design, they would have to make significant changes. They would need to implement cache snooping and complicated memory management. Given that the new dual-core processors (AMD ones, at least) are meant to be pin-compatible with current processors, this would be a bit much to ask. Maybe they'll have unified caches sometime, but I don't see it happening anytime soon.

      --

      Don't you hate meta-sigs?
    9. Re:Note: Here, Single is Better by drinkypoo · · Score: 4, Interesting

      The Hammer-core processors with dual-channel memory controllers have more memory bandwidth than the best G5, and the memory is accessed directly by the processor. Hypertransport is really quite an excellent interconnect. Hammer is NUMA-architecture and each processor gets its own block of ram. Finally, the Opteron dissipates much less energy as heat than the intel offerings - only about 46W max. I believe this is still a bit more than the G5, of course, but it's really not that bad.

      So yes, the proper term is compete.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    10. Re:Note: Here, Single is Better by spuzzzzzzz · · Score: 5, Informative
      Are there situations where two caches might be better? For example, a multi-threaded application with two memory-intensive threads, each locked down onto a specific CPU?

      Not really. The problem with 2 caches is duplication. It is quite probable that both cores will want to work on the same thing, in which case cache space will be wasted. It also creates timing complications when one core wants to write to its cache because the other core will have to be told to invalidate its relevant cache entry. On the other hand, you could create a single cache with double the size. This would make sharing memory between CPUs simpler and it wouldn't significantly increase access times (so the situation you mentioned wouldn't be affected). The argument for double caches is about cost, scalability and design simplicity, not performance.

      --

      Don't you hate meta-sigs?
    11. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 0
      What part of spank them and stole their lunch money does x86 fail to understand.


      How about the part where x86 owns 99 percent of the market? Doesn't mean they're better, technically, but it sure does mean that the market has chosen and has not chosen a Mac (again).

    12. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 0

      umm... why would it be a bug? Oh I get it, you just wanted to use an old worn out joke. ok then, move along.

    13. Re:Note: Here, Single is Better by riptide_dot · · Score: 2, Insightful

      FTA: Keeping the cache as one single unit theoretically allows each processor core to access more data in a rapid fashion. Dividing the cache, however, also cuts down on some design work.

      In case it's not obvious to those who didn't read the article all the way through, it's a better thing when the memory is shared (single cache) rather than separate (dual cache).

      Yes, it's better to have a single cache for performance reasons (cache "hit" rates would theoretically be higher with a single larger cache). But it's also better for other reasons too - more L1 and L2 cache (which is made using SRAM, not DRAM) is really expensive. Two cache modules mean more pricey chip$.

      --
      I was in the park the other day wondering why frisbees get bigger and bigger the closer they get - and then it hit me.
    14. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 0

      You've never used an air compressor, obviously.

    15. Re:Note: Here, Single is Better by jackb_guppy · · Score: 4, Informative

      The PPC4 does not have single cache...

      There a L1 caches for both cores.

      There are 3 L2 caches hooked to cross bar switch for speed flowing data into and out of the L1

      There is a single L3 controller overseeing 2 L3 external memory banks.

      Then there is two busses to 2 main memory.

      And 3 interconnects to 3 other dual core chips that make a single 8way processor block.

      And 4 busses inter connecting 4 of these 8way to make a 32way machine, with dual IO channels to hardware!

    16. Re:Note: Here, Single is Better by hattig · · Score: 4, Informative

      No no no no.

      That's all wrong.

      The Opteron has always supported dual cores, and it isn't via "internal hypertransport", the internal crossbar connects to the SysReq that supports two cores attached directly. You cannot attach a shared cache dual core to this design. Each core must have its own individual L2 cache. This is why you could have an 8 processor Opteron system with dual-cores for 16 cores in total despite the fact that the current Opteron can only do 8 processors at the most glueless. Oh, and Hypertransport doesn't connect to memory either, the memory controller is something else connected to the internal crossbar.

      And for the Opteron this is a good design. As the cores are on the same chip, cache coherency will be done at the speed of the processor and not be limited by inter-processor bandwidth. It really isn't a problem at all that the cores each have their own individual cache. At least they aren't competing with each other for cache bandwidth. The only bad point is that a core cannot have the option of using up to 2MB of shared cache - not as big a problem as it might sound, 1MB is doing very well for Opteron, and the on-die memory controllers negate a lot of the latency penalty for main memory access.

    17. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 0

      I'm pretty sure P4's can't be run in dual mode. You are probably talking about Xeons right?

    18. Re:Note: Here, Single is Better by Paladine97 · · Score: 1

      It's a good thing the G5 stole the x86's lunch money, because it's going to need it when it comes time to buy one!

    19. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 1, Insightful

      You have to be kidding. The noise has bugger all to do with the processor inside the box. It's the cooler obviously. You pay good money for a P4 cooler and you'll get some really good examples of fine engineering - take Zalman for example. If you stick with the shitty stock cooler you get for free with the chip then expect noise.

      If Apple gave up their design principles and just stuck any old cooler on their G5's would you still complain? My spidey-senses tell me no.

    20. Re:Note: Here, Single is Better by Moridineas · · Score: 1

      I wonder what those new watercooled G5's sound like...

    21. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 1, Funny

      Two cache modules mean more pricey chip$.

      You know, once you reach the grand old age of, oh I don't know, 14 or so, you'll stop thinking it's cool to put '$' instead of 's' for every word remotely associated with money.

    22. Re:Note: Here, Single is Better by tupps · · Score: 3, Interesting

      Motorola (Freescale) have already announced that they have announced that they will have dual core g4s available this autumn (I assume as engineering samples).

      They are aiming this at Mac Notebooks.

      I beleive IBM have already planned a roadmap for the g5 that includes dual core.

      --
      Go out and get sailing!
    23. Re:Note: Here, Single is Better by Ivan+the+Terrible · · Score: 1

      My kernel (linux-2.6.x) does a very good job of keeping processes running on the same CPU they last ran on. Not perfect, but good. So, wouldn't separate caches be as good as a single cache, and perhaps better?

    24. Re:Note: Here, Single is Better by shawnce · · Score: 1

      My dad just got his and it is quieter then a little 80GB hard drive he had sitting running on the corner of his desk, his words almost to the T.

      My first generation G5 (dual 2GHz) is also almost as quiet.

    25. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 0

      So I guess it's SOOOOooo hard to design a single cache system that it'll be years before we see it.

      At some point down the road, they will likely unify them to increase performance.

      Yeah, like after they milk the lame brained design for more than it's worth. C'mon, it's not cache science (which is harder than rocket science, apparently)!

    26. Re:Note: Here, Single is Better by mrchaotica · · Score: 2, Interesting

      So all those people waiting for G5 Powerbooks are going to end up with dual-core G4 ones instead?

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    27. Re:Note: Here, Single is Better by spuzzzzzzz · · Score: 2, Informative

      If (1) your kernel did a perfect job of keeping a single process/thread confined to a single CPU and (2) none of your processes/threads were sharing the same memory then a dual cache would perform about the same as a single cache. The main problem here is number (2). When you're running a multi-threaded program, even if the kernel manages the CPUs perfectly, the threads will usually want to share some memory. They may need to pass information to each other or they may just be working on the same data set. In either case, a dual cache makes things worse.

      There are plenty of situations where a single cache would perform better than a dual cache, but there are (almost?) no situations where a dual cache would perform better than a single cache. Hence single cache is better performance-wise.

      Bear in mind, of course, that this is speculation. I have not carried out any benchmarks of single cache versus dual cache and I don't think there are any publicly available benchmarks comparing them. From a purely academic point of view, however, single cache wins.

      --

      Don't you hate meta-sigs?
    28. Re:Note: Here, Single is Better by fish+waffle · · Score: 1

      If (1) your kernel did a perfect job of keeping a single process/thread confined to a single CPU and (2) none of your processes/threads were sharing the same memory then a dual cache would perform about the same as a single cache.

      Surely there's a synchronization cost if both cpus are accessing the same resource.

    29. Re:Note: Here, Single is Better by asdfghjklqwertyuiop · · Score: 1

      Well you seem to know what you're talking about and I've always wonder this, but what are the advantages of having multiple cores in one chip instead of having multiple discrete cpus like traditional SMP systems?

    30. Re:Note: Here, Single is Better by spuzzzzzzz · · Score: 1
      Surely there's a synchronization cost if both cpus are accessing the same resource.

      A minor one. Since this is all being done on-die, the bandwidth is huge and the latency is tiny, so synchronisation is pretty painless. Even so, the cost of synchronising two caches is potentially greater than the cost of synchronising two cores using the same cache. In the dual-cache case, one cache entry will need to be invalidated each time a core writes to a shared entry. Then the core with the invalid entry will need to re-read the data from the other core's cache (this adds a temporal consistency issue to the spatial one because the cache being read from needs to be up-to-date). With single-cache, you only need to ensure temporal consistency.

      --

      Don't you hate meta-sigs?
    31. Re:Note: Here, Single is Better by Nahor · · Score: 1
      Surely there's a synchronization cost if both cpus are accessing the same resource.
      You have synchronization in dual cache too because there must be away for them to tell each other when it is being modified.

      More, with dual cache, synchronization may actually be worse because it's two "distant" units that need to be synchronized instead of one cache telling itself to not give the data to the other processor yet.
      You can tell something to yourself at least as fast as (and hopefully faster than) you can tell someone else.
    32. Re:Note: Here, Single is Better by randyest · · Score: 4, Informative

      Interconnect delay (latency) is reduced. Signals propagate traces on a die (silicon chip) are orders-of-magnitude faster than printed-circuit board (PCB) traces.

      That means you can get more bandwidth with silicon than a circuit board (each of reasonable size using modern components/processes.)

      Also, it takes a lot less power to run lower-voltage drivers on low loads (little resistance and capactiance on die compared to a PCB.)

      So, why not stack everything on onw chip? Cost of a chip rises exponentially with die size. Up to about 20mm^2, it's feasible (but pricy) bigger dice are very hard to make, result in lower yields, and hence cost a lot more.

      --
      everything in moderation
    33. Re:Note: Here, Single is Better by Moridineas · · Score: 1

      Nice to hear! I've been eying them covetously ;)

      Dual G4 at the office is loud as hell.

    34. Re:Note: Here, Single is Better by timeOday · · Score: 2, Informative
      it's a better thing when the memory is shared (single cache) rather than separate (dual cache).
      Yeah, if the dual cache could be shared and still run without added latency or decreased bandwidth. That doesn't mean a different chip with a unified cache would be faster though.

      Also, the same is true of dual cores in the first place. It would be better to have a single processor (without dual cores) if it could be twice as fast. Unfortunately, chip designers seem to be running out of ways to usefully employ all the transistors Moore's law is giving them. Now they're resorting to designs that employ parallelism, which is relatively easy to do, but harder to exploit in software, and sometimes hardly useful at all.

    35. Re:Note: Here, Single is Better by shawnce · · Score: 1

      Yeah my old Dual G4 wind-tunnel sits off in the corner silent with no power attached... I only power it up to do 2 machine debugging and only when I must.

    36. Re:Note: Here, Single is Better by Wesley+Felter · · Score: 1

      In Power4, the L2 and everything beyond is shared by the two cores. (The fact that the L2 has three slices isn't relevant here.) Many of the upcoming chips have multiple, private L2s, hence the point of the article.

    37. Re:Note: Here, Single is Better by High+Hat · · Score: 1

      Isn't the Opteron like, 100m^2? Not really sure about the Opteron, but I recall my last Athlon XP (Thoroughbred) getting a lot hotter because die size was reduced from 120mm^2 to 80mm^2 from the previous version. So 20mm^2 sounds a little wrong to me...

    38. Re:Note: Here, Single is Better by randyest · · Score: 1

      You're exactly right. Sorry, I meant 20mm x 20mm, or 400mm^2.

      --
      everything in moderation
    39. Re:Note: Here, Single is Better by Nahor · · Score: 1
      Isn't the Opteron like, 100m^2?
      100m^2? The ENIAC maybe be, certainly not the Opteron. ;)
    40. Re:Note: Here, Single is Better by asdfghjklqwertyuiop · · Score: 1

      Interconnect delay (latency) is reduced. Signals propagate traces on a die (silicon chip) are orders-of-magnitude faster than printed-circuit board (PCB) traces.


      But isn't most of the communication cpu to/from memory, not cpu to cpu?
    41. Re:Note: Here, Single is Better by vpupkind · · Score: 1

      IMHO, single is worse... You have two distinct processes running simultaneously on two different cores. Now if there is a single cache, one would be able to get into a cache war for a given line. On the other hand, the only serious gain from a single cache is not needing to synchronize the writes.

    42. Re:Note: Here, Single is Better by shawnce · · Score: 1
      Did you bother to look at the link the parent provided?

      Each core of the Power4 shares the L2 cache bank and L3 cache on a given die. This is exactly what folks are talking about when the say shared or independent caches (of course the L1 cache are not shared since they are really part of the core).

      To quote...

      The components of the POWER4 chip are shown in Figure 1. The chip has two processors on board. Included in what we are referring to as the processor are the various execution units and the split first level instruction and data caches. The two processors share a unified second level cache, also onboard the chip, through a Core Interface Unit (CIU) in Figure 1. The CIU is a crossbar switch between the L2, implemented as three separate, autonomous cache controllers, and the two processors. Each L2 cache controller can operate concurrently and feed 32 bytes of data per cycle. The CUI connects each of the three L2 controllers to either the data cache or the instruction cache in either of the two processors. Additionally, the CUI accepts stores from the processors across 8-byte wide buses and sequences them to the L2 controllers. Each processor has associated with it a Noncacheable (NC) Unit, the NC Unit in Figure 1, responsible for handling instruction serializing functions and performing any noncacheable operations in the storage hierarchy. Logically, this is part of the L2.

      Figure 1: POWER4 Chip Logical View

      The directory for a third level cache, L3, and logically its controller are also located on the POWER4 chip. The actual L3 is on a separate chip. A separate functional unit, referred to as the Fabric Controller, is responsible for controlling data flow between the L2 and L3 controller for the chip and for POWER4 communication. The GX controller is responsible for controlling the flow of information in and out of the system. Typically, this would be the interface to an I/O drawer attached to the system. But, with the POWER4 architecture, this is also where we would natively attach an interface to a switch for clustering multiple POWER4 nodes together.


      Also note not all POWER4 chips are packaged into 8 chip module.
    43. Re:Note: Here, Single is Better by spuzzzzzzz · · Score: 1
      Now if there is a single cache, one would be able to get into a cache war for a given line.

      How? If the cache has decent bandwidth, it is easy for two cores to read the same line. On the other hand, if one or both of the cores wants to write to the line then the situation for single-cache is much simpler than the double-cache case.

      Unless you mean that the cores could keep overwriting each other's line with data read from memory. This is a possibility, but it is offset by the fact that the unified cache could be twice the size of each private cache. And it corresponds to the case where the working set is larger than the cache size, which is an issue even for single processor configurations.

      --

      Don't you hate meta-sigs?
    44. Re:Note: Here, Single is Better by pompous+windbag · · Score: 1

      We have a dual p4 server, the damn thing sounds like a gas turbine when it's on. Really, I've used quieter air compressors.

      ...and I've worked with numerous P4 servers that were whisper quiet -- so what's your point?
      Oh, and I think all those dual-P4 servers were bought for about the same amount as one of your dual-G5s.

    45. Re:Note: Here, Single is Better by Jeff+DeMaagd · · Score: 1


      But isn't most of the communication cpu to/from memory, not cpu to cpu?

      I think you might have a point, although it might depend on the task. At the moment, an Opteron running on one RAM channel isn't that much slower than a dual channel Opteron. What scares me is that when a single Opteron can effectively use dual RAM channels, then wouldn't a dual core Opteron benefit from four channels? What kind of package would that be? That makes me wonder if the on-die memory channel wasn't a viable long-term solution, meaning whether it would prove to eventually be a restraint before the AMD K9 is released.

    46. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 0

      Compete? What part of spank them and stole their lunch money does x86 fail to understand.

      What part of "Got their ass kicked on SPEC" do Mac Fanboys fail to understand. (Real SPEC, not Steve Jobs' bogoSPECmarks.)

    47. Re:Note: Here, Single is Better by randyest · · Score: 1

      Maybe, depending on what your CPUs are doing (one thread split between CPUs sharing all data, two threads sharing some data, or two totally independent threads with no common data, or something in between those three points.)

      But he asked why two cores on one die might be desirable over two cores on seperate dice with some interconnect in between. And I answered ;)

      --
      everything in moderation
    48. Re:Note: Here, Single is Better by psetzer · · Score: 1

      One can always add registers and L1 cache when they've got transistors to burn. I won't be suprised when in the future we can run an entire emulator of some retro system simply using L1. However, being able to do multiple things at once always comes in handy. Note that since Quicksort can be written recursively, you can split it between processors, and each one has a disjoint working set. Furthermore, with multiple cores, you can do things that were computationally infeasible before. AI typically involves a whole bunch of searching for an optimal solution, and anything that speeds up searching makes your life a hundred times easier. Finally, remember that anything that speeds up Gnome or KDE is worth your money.

      --
      "Anyone who attempts to generate random numbers by deterministic means is living in a state of sin." -- John von Neumann
    49. Re:Note: Here, Single is Better by TheRaven64 · · Score: 1

      A dual core solution would be ideal for a laptop. Power (and heat hence heat) scale roughly in quadratic terms relative to clock speed, so adding a second CPU would be much more power efficient than doubling the CPU speed. Currently, the only things I do on my PowerBook that really tax it are compilation and video editing, both of which are highly parallelisable and would benefit immensely from a second core.

      --
      I am TheRaven on Soylent News
    50. Re:Note: Here, Single is Better by Cynikal · · Score: 1

      it$ true, and if you keep doing it your keyboard will $tay that way... $eriously

      li$ten to your elder$

    51. Re:Note: Here, Single is Better by EvilTwinSkippy · · Score: 1

      (Cough) Dell rackmount (cough)

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    52. Re:Note: Here, Single is Better by jackb_guppy · · Score: 1

      Did you understand what you read?

      Each core own indenpent L1's so there is common cahce as the original post talked about.

      The L2 - there are 3 "three separate, autonomous cache controllers" again not a common single 1

      The L3 - external controller - there is one of these but there are 2 banks.

      About 8way clusters... The chip is designed for that maximum package. It is also use as single 2way processor on low end machines, 4 way and 8 way. Also there are single core versions the can be clustered into a max of a 16way machine.

    53. Re:Note: Here, Single is Better by essreenim · · Score: 1

      Yay, isnt technology wonderful. Instead of making things smaller we are making them bigger -pfff,
      Intel and AMD both suck for this.
      I can just see the Opteron now - a big clunky 1878 pin monster with 2 fans. And dual cache - even worse. Come on - pathetic. I expect allot more. Im going to get a tablet laptop with a Via mobo and proc next - in a year (max) they will be small and powerful enough for ALL my needs, with low power consumption and low noise to boot, and probably 1/8th the size of those monsters.

    54. Re:Note: Here, Single is Better by GeckoX · · Score: 1

      Just a note:

      Dual p4 and dual G5 do NOT equal any sort of dual-core. Totally offtopic and unrelated, and totally unnoticed by anyone. Interesting.

      --
      No Comment.
    55. Re:Note: Here, Single is Better by shawnce · · Score: 1

      Did you understand what you read?

      Yes I did and thanks for the concern about my reading comprehension abilities.

      Each core own indenpent L1's so there is common cahce as the original post talked about.

      The original post made no mention of L1 you are assuming that. Yes the L1 is NOT shared (as I stated before). No multi-core CPU will likely ever share L1 since that is often rather integrated/intertwined with the core itself.

      The L2 - there are 3 "three separate, autonomous cache controllers" again not a common single 1

      The L2 is shared by the cores (note "The two processors share a unified second level cache, also onboard the chip") and this is what folks have been generally talking about in this topic. The fact that the L2 is implemented as "three separate, autonomous cache controllers" doesn't change this fact. Each core connects to a cross bar that interfaces it with the L2 (or out to L3, etc.) and each core can access any one of the three L2 cache banks. This is done to allow them to share the L2 cache more easily since while one it pulling a cache line from one L2 bank/controller another can be doing the same with a different controller... think of it as something like dual ported RAM. They can also directly share data in the cache this way.

      The L3 - external controller - there is one of these but there are 2 banks

      The same applies as above to the L3 and so on. Look at the picture if you are confused.

      In fact if you go read the article you may see the context that folks are discussing better, to quote...

      Similarly, each core of the dual-core Opteron will have separate caches, said Marius Evers, a researcher at AMD. Putting two cores on one chip increases computing performance while controlling power consumption, a major problem facing designers.

      Splitting the cache differs from the approach taken by IBM, which came out with the first dual-core server chip with the Power 4, according to Kevin Krewell, editor in chief of the Microprocessor Report.


      So if folks had been talking about L1 then the above would be incorrect but since they are talking about L2 things make sense.

    56. Re:Note: Here, Single is Better by shawnce · · Score: 1

      Just so you know the G5s don't have separate banks of RAM as you imply... see my other post [slashdot.org].

    57. Re:Note: Here, Single is Better by shawnce · · Score: 1

      Oops... this post

    58. Re:Note: Here, Single is Better by Anonymous Coward · · Score: 0

      No it most certainly is not.

      Having a single shared cache means that both cores contend for access to it, slowing them down. L2 cashes usually run at half the core speed as it is, so having to supply 2 cores would really bog it down.

      You seem to think that both cores are often going to be working on the same data at the same time and that just isn't true. Even in a multithreaded application where both threads ( and hence cores ) would share data, they don't both manipulate the same chunk of it at the same time. Because of this each core is working with different data sets 99% of the time, and thus each wants to cache different stuff, so they are better off with seperate caches.

  3. New Computer by Lullabye_Muse · · Score: 1

    Hopefully gonna be able to build a new computer by christmas (if I ever get a job) but maybe i should save my money and hold off until new chipsets and mother boards come around?

    1. Re:New Computer by Izago909 · · Score: 2, Insightful

      With that logic, you'll always be holding off for some new development.

    2. Re:New Computer by raquelita · · Score: 1

      yeah! I said the same last year... but i prefered to hold off until the PIV... and after until the hypherthreading... and now until the dual processor...
      It will never end!!
      So, i'm going to get my new PC now!!

      --
      Yes, I am a /.er girl http://raquelms-travel.blogspot.com
    3. Re:New Computer by Lullabye_Muse · · Score: 1

      Well I would buy a computer now but I have no cash, but seems like I'm gonna get a job soon. So proabably gonna be around Christmas til i can, so i could wait til Q1 05 and get something better than what'll be out in December.

    4. Re:New Computer by AKAImBatman · · Score: 4, Funny

      Well I would buy a computer now but I have no cash

      Is that a pun?

    5. Re:New Computer by Anonymous Coward · · Score: 0

      Ah, to be a kid again, when necessity was the motherhood of prudence. Now that I am all grown up with lots of money and buy just about whatever I want whenever I want, the cool stuff just isn't as cool anymore. The cute girls aren't as interesting either.

    6. Re:New Computer by mikael · · Score: 1

      Maybe you should consider a laptop. You can easily get laptops that can run Linux and have 3D graphics acceleration built in. Have a look at the Sony PCG-GRT916 range; 512/1024 Mb,80 Gbyte Disk,Nvidia Go5600, dual core Pentium 4, bright 16" LCD display, DVD RW. All working under RH/FC2.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    7. Re:New Computer by ckaminski · · Score: 1

      Usually because they're bubbling morons and too interested in preening than paying attention to the world around them. The rest are psychos, glorious in bed, and frightful when angered.

    8. Re:New Computer by johnw · · Score: 1
      A better plan is to wait until something new comes out and then buy whatever it replaced. Buying the latest greatest thing will always give you poor value for money, whatever the man in the computer shop says.

      If you don't have money to burn (and it sounds like you don't) think in terms of something like:

      • 1GHz / 1.5GHz processor
      • 512M / 1G RAM (it's cheap)
      • Fast HD
      • Good video card
      • Good sound card
      You'll pay less than half the money for a machine which gives you 99% of the functionality and life.

      HTH
      John

  4. Yeah... by spidereyes · · Score: 1, Funny

    but will it make coffee? I didn't think so.

    --

    I say we just grow up, be adults and die.
    1. Re:Yeah... by Anonymous Coward · · Score: 0

      >but will it make coffee? I didn't think so.

      The Intel CPU will at least be able to keep it at a comfortable 80c for you.

    2. Re:Yeah... by chill · · Score: 1

      but will it make coffee? I didn't think so.

      You know, if X-10 did get their asses sued off and all those popups stopped, you wouldn't be asking this question.

      --
      Learning HOW to think is more important than learning WHAT to think.
    3. Re:Yeah... by Carnildo · · Score: 2, Informative

      but will it make coffee? I didn't think so.

      Given that the power output of a single-core Prescott is 100 watts or more, a dual-core with separate caches will put out 200+ watts. Clock up the speed a bit more, and you'll be at about 300 watts.

      I figure that's probably enough to boil a cup of coffee.

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    4. Re:Yeah... by DrLZRDMN · · Score: 1

      we just went over this.

    5. Re:Yeah... by chill · · Score: 1

      Thanks, I didn't know all the details. I figured the hardware was still available under different OEMs, but since all those obnoxios ads stopped...

      --
      Learning HOW to think is more important than learning WHAT to think.
  5. Confused by Shard013 · · Score: 3, Interesting

    I'm not a hardware pro, but is this basically the same as having two seperate chips, or am I missing the point here?

    1. Re:Confused by dougmc · · Score: 4, Informative
      No, you're not missing the point.

      The benefit is that you get two CPUs in less space. You might even be able to get two CPUs in a system designed to support only one (because it has only one slot.) And if your system already has two CPU slots, this might give you four CPUs.

      It might also use less power than two CPUs, but I wouldn't hold my breath on that one.

    2. Re:Confused by eddy · · Score: 4, Interesting

      Yes. Actually, I would have thought that the reverse (shared cache) would have been news instead.

      The point is that you can have very fast inter-CPU communication, the moderboard gets cheaper to produce, you don't have to double the cooling machinery... and they're probably cheaper to produce also (one package instead of two).

      I assume the cores are actually produced one-by-one or it'd get big and very expensive.

      --
      Belief is the currency of delusion.
    3. Re:Confused by ERJ · · Score: 5, Informative

      Kinda. I could see a couple advantages though:

      1) Fast interconnect between chips. Instead of having to transfer data over the bus, if the CPU needed info from the other CPU it could transfer over a high speed connection without having to involve other parts of the machine (bus). AMD already has a sort of high speed interconnect to their multi-cpu motherboards instead of splitting like intel does but I would imagine that this would still be faster.

      2) Less motherboard room needed. You don't need dual cooling fans, dual power / interface lines and have more room overall on the motherboard.

    4. Re:Confused by benjamindees · · Score: 1, Insightful

      Of course you're missing the point! You're concentrating on the technical value of such a design.

      You should be concentrating on the marketing bullshit value instead.

      --
      "I assumed blithely that there were no elves out there in the darkness"
    5. Re:Confused by Lord+Kano · · Score: 3, Informative

      I'm not a hardware pro, but is this basically the same as having two seperate chips, or am I missing the point here?

      Pretty much the same thing as having two processors, but once things are running at proper capacity, it will be cheaper to put two cores on one chip. In part because you won't have to reproduce the underlying electronics. The motherboards will also be cheaper. One socket means less money spent on R&D. If and when someone releases a dual socket/quad core motherboard it will be cheaper to design and build than a quad socket board.

      LK

      --
      "Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
    6. Re:Confused by Anonymous Coward · · Score: 2, Informative

      I doubt the dual core processors will be socket compatible with existing single core processors, so you will be unlikely to be able to upgrade an existing motherboard to dual processor just by dropping in a different CPU. It is possible they will come out with new socket designs which can accomodate either dual or single core CPUs, but I wouldn't bet heavily on it.

      The benefit, as you say, is in space, with possibly a small amount in power consumption, but I'd agree not to hold your breath, and even if it did, probably not a lot. Space savings isn't that big an issue for desktop systems in most cases, but it is a huge issue in things like blade servers and even 1U/2U rackmount servers. Fitting two huge CPU sockets, as well as all of the heat sinks and fans necessary into a 1U case is a real challenge, so only needing 1 socket and one heat sink is a huge win there.

      The other benefit you don't mention is likely to be in cost, as a dual core CPU will probably be cheaper to manufacture than two single core CPUs. A significant portion of the cost of a CPU is the packaging (not the cardboard box, the ceramic casing, pins, heat spreader, etc), and the labor costs to put the chip into that package, if you only have to do that once, it will save money. Likely AMD and Intel won't pass all those savings along to us, so their margin on dual core CPUs will probably be higher, which is undoubtedly the reason they are pushing so hard in that direction.

    7. Re:Confused by Chuckstar · · Score: 1

      Disadvantage: Shared memory and I/O bandwidth. Each processor has the same bandwidth as before, but that bandwidth is used by two cores.

    8. Re:Confused by Wesley+Felter · · Score: 1

      Perhaps this is why the Opteron has lots of bandwidth to start with.

    9. Re:Confused by hattig · · Score: 1

      IIRC AMD have stated that their dual-core processors will be socket compatible. As AMD has the important stuff on the northbridge inside their processor (memory controller, interconnect, crossbar and dual-core interface) this is possible as long as the dual core processor operates within the power envelope that AMD have specified for current systems (89W for current ones, maybe a higher ~105W for the future, probably to accomodate dual core).

      Intel will probably introduce a new socket, their dual-core implementation is rumoured (and I think this was from the inquirer, so get your salt licks ready) to literally be two processors on the same die with all the limitations of Intel's Shared Bus architecture (i.e., the shared FSB is all that is connecting two separate processors on the same slab of silicon). OTOH that might mean that they can use the same old socket for the processor, but you won't be able to run dual dual-core processors because the chipset won't support it, it will see it as 4 processors connected, whereas with AMD it will see it as 2 processing devices connected via hypertransport.

    10. Re:Confused by pla · · Score: 1

      I'm not a hardware pro, but is this basically the same as having two seperate chips, or am I missing the point here?

      Yes and no...

      For most purposes, this will have the same effect as having two CPUs. It will take up a bit less room, probably the same amount of power (for which two reasons, cooling will become a huge problem in the near future), but from outside-the-beige-box, it will look like just having two CPUs (and unlike Intel's pathetic "hyperthreading", will perform like two CPUs as well).

      To see why this really matters, you need to look into the not-so-distant future, perhaps 5 years away. CPUs have started speeding up less quickly... Perhaps Moore's law hasn't stopped yet, but we've reached the point where, to keep pushing it, Intel and AMD need to build a new chip fab (a multi-BILLION dollar factory) literally every year. That means either finding a new way to get quick-and-dirty speed boosts, or charging a lot more for CPUs.

      Now, consider the humble start of the "IC"... First a single transistor. Then they managed to put two per chip, then a quad-pack, and now we have millions of transistors on a single chip.

      I believe this will represent the next big leap in CPU power. We'll see two, then four, then probably sixteen (not eight, and although nine counts as the next square, it just sounds too strange to use). Then some company (possibly neither Intel nor AMD) will figure out a way to deal with heat in a 3 dimensional CPU packing (Personally, I like the idea of "silicon on copper pipe", and just let me use the damn computer instead of spending money on a hot water heater), and the number of cores will skyrocket the same way transistor counts did.

    11. Re:Confused by Epistax · · Score: 1

      Did you mention the fact that without putting two cores on one chip (thus "one processor"), the speed increases in processors would quickly come to an almost halt? Even if you can match the performance of all the logic (Intel has a 10 GHz add), the L1 cache just isn't fast enough. You've created a ripping fast latency device.

      Personally what I think will eventually happen is the number of cores will increase and eventually they'll develop a program-on-demand chip that creates physical logic on the fly for what needs to be done. Need a Fourier transform? Boom- it's hardwired. Need it to do image compression now? Boom- there you go. Running a memory intensive program? Boom- additional 128 kb of L2 cache, etc, etc. Not that I know of any way to do this, but that's never stopped anyone before.

    12. Re:Confused by chipace · · Score: 1

      They have the choice of adding more cache or another core. Keeping the core and cache size the same when you go to a new process only reduces the die area. They would compete against their existing inventories of the old process (like P4 and prescott). That's not a recipe for profit.

      Dual cores share common resources, and therefore must have less single/independant thread performance compared to separate cpus. Additionally, without sharing the cache, each core needs to access the common main memory to communicate (variables), that's much slower than common cache.

      All my threads run independantly... I'd rather have the extra cache on a single, improved core.

    13. Re:Confused by mr+i+want+to+go+home · · Score: 1

      Hello, are you Steve "Boom!" Jobs by any chance? Boom-!

  6. Itanium? (somewhat off-topic) by Anonymous Coward · · Score: 0

    Is Intel still developing the Itanium? I thought that it was a flop? Are they hoping that future sales will be stronger, or is the Itanium not the titantic everyone plays it out to be?

    1. Re:Itanium? (somewhat off-topic) by Anonymous Coward · · Score: 5, Informative

      Despite what Sun has to say on the matter, Itanium system and processor sales have been increasing steadily since 2H,2000prior to that, there was a big lull in demand because few wanted to buy underperforming Itanium 1 machines when the Itanium 2 was expected rather soon (and announced relatively early).

      Today, in contrast, there _doesn't_ appear to a lull in demand for Itanium 2 machines, even though Montecito (Itanium 3) has been announced in a fair bit of detail. That's because for some applications (in HPC, high-end database work, certain EDA/CAD/CAE work, and ultra-high-reliability computing) Itanium 2 systems are basically unbeatable. They also run some OSes which are very important to some organizations, such as HP-UX and OpenVMS.

      Long story short, the Itanium 1 was something of a flop, the Itanium 2 is really pretty decent, and everyone is expecting the Itanium 3 to offer pretty decent _price/performance_, in addition to best-bar-none performance when it is released next year.

    2. Re:Itanium? (somewhat off-topic) by csimpkin · · Score: 2, Informative

      The problem with the Itanium was that Intel didn't release an optimizing compiler with or before the Itanium. I believe (corrections welcome) that instructions are grouped in 'packets' (I forget the term used) that the Itanium can run in parallel. The problem is that only certain instructions can be bundled together. When older compilers are used the instructions are generated in a way that only a few or even just one instruction is in a 'packet'. So, the problem was that the processor wasn't being used to its fullest potential. I have never compaired the Itanium 1 and 2. But, I would guess that the Itanium 2 was primarily released to give the Itanium line a fresh start with an optimizing compiler.

    3. Re:Itanium? (somewhat off-topic) by Carnildo · · Score: 1

      The term Intel used was EPIC (Explicitly Parallel Instruction-set Computing). The idea was to let the compiler select which instructions should be executed at the same time, rather than have the CPU decide. Supposedly, letting the compiler take all the time it needs to select the best packets gives a much faster CPU than having the CPU decide on the fly.

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    4. Re:Itanium? (somewhat off-topic) by sheddd · · Score: 1

      IMO the real problem with Itanium is the compilers...

      x86 decides how to do things in parallel in hardware (more or less)...

      Itanium, while simplistic and clean, an architecture many academics would praise,

      requires a compiler that is uber smart (for many tasks)... Itanium shines on ops where it's a 'no brainer' for a compiler to tell the chip how to do things in parallel.

      Itanium 3 may have decent price/performance, but only if Intel sells them at a huge loss.

      You say Itanium's been 'somewhat of a flop'... I'd call ~10billion spent for little revenue the biggest disaster ever in the semiconductor industry... I think Intel assumed they would be able to push everyone to a new (flawed IMO) architecture with their near monopoly.

      Intel's sunk billions into Itanium; I hope they continue to push it (because I think it'll be a waste of cash and I have AMD stock!)

    5. Re:Itanium? (somewhat off-topic) by TheLink · · Score: 1

      Then the prob would be if Itanium 3 or Itanium n+1 comes out, all your binaries would be rather suboptimal if the Itaniums don't do the fancy rescheduling themselves (which is the whole point of EPIC).

      Solution would be recompilation (whether dynamic/automatic or manual), or sticking to the same architecture all the time.

      --
    6. Re:Itanium? (somewhat off-topic) by Anonymous Coward · · Score: 0

      Oh..., ~10billion is not true. FUD as usual.

    7. Re:Itanium? (somewhat off-topic) by JonAnderson · · Score: 1
      Despite what Sun has to say on the matter, Itanium system and processor sales have been increasing steadily since 2H,2000prior to that, there was a big lull in demand because few wanted to buy underperforming Itanium 1 machines when the Itanium 2 was expected rather soon (and announced relatively early).
      Sorry, do you have any numbers to back this up. It's my understanding that itanium shipments are still insignificant wrt sparc/power shipments.
    8. Re:Itanium? (somewhat off-topic) by TheRaven64 · · Score: 1
      With traditional Very Long Instruction Word (VLIW) chips, the compiler has to group instructions into blocks that can be executed at the same point in the pipeline. The CPU then reads these long instructions and executes them. If you had a simple VLIW chip with an integer unit, a branch unit and a floating point unit then each of these instruction words will contain at most one branch instruction, one integer instruction and one floating point instruction. This means that when you add (for example) another FPU to the architecture, old code will simply ignore it. Explicitly Parallel Instruction Computing (EPIC), as used by the Itanium, builds on this idea slightly. Instructions are bundled together on the assumption that the target machine is infinitely parallel. The CPU then executes these instruction bundles in order, making no guarantees about the order of instruction execution within a bundle.

      A traditional superscalar architecture does not have these hints, and so it attempts to parallelise everything, and then simply ignores the results of instructions that should not have been executed. In the long run, EPIC could produce chips with an incredibly good price / performance ratio. A large amount of silicon on an x86 chip is used to translate x86 instruction into RISC instructions and then to determine how they should be parallelised, whether they were parallelised correctly etc. With EPIC, none of this is needed, meaning you have a lot more silicon left for actual computation.

      The problem with EPIC is that the compiler needs to be a lot more clever, and clever in a different way, to current compilers.

      --
      I am TheRaven on Soylent News
    9. Re:Itanium? (somewhat off-topic) by TheLink · · Score: 1

      "Instructions are bundled together on the assumption that the target machine is infinitely parallel."

      Yeah, but bandwidth isn't infinite, and there's always latency. You also say there's no guarantee on the execution order in a bundle - do they have to keep padding some bundles with NOPs, or can bundles be variable length? So I don't see the problem going away- e.g. old code needing to be recompiled.

      Still it's a matter of percentages - if most old code doesn't need to be recompiled for improved performance (e.g. opteron, P4) then that's great. But so far it seems that corollary of EPIC is recompilation is needed.

      The Itanium seems to need super huge caches. So it's using large amounts of silicon anyway. Maybe that's coz of all the duds in the bundles? Itanium code appears to take up more space than Alpha and x86 code.

      --
    10. Re:Itanium? (somewhat off-topic) by TheRaven64 · · Score: 1
      Yeah, but bandwidth isn't infinite, and there's always latency.

      True, but I'm not sure of the relevance...

      You also say there's no guarantee on the execution order in a bundle

      Correct. That's the point of a bundle. It signifies a block of instructions that can be executed in parallel, assuming that the hardware has the capacity to do so, or serially if it does not.

      do they have to keep padding some bundles with NOPs, or can bundles be variable length?

      Bundles are variable length. If they weren't, then it would be no different from a standard VLIW implementation (which does require NOPs for each execution unit not doing anything that cycle, since the instructions are a fixed length).

      So I don't see the problem going away- e.g. old code needing to be recompiled.

      Old (Itanium) code does not need to be recompiled on a new Itanium. The machine code already assumes an infinite degree of parallelism is possible. The new CPU will simply execute more instructions from a particular bundle than the old one (assuming the size of the bundle if greater than the amount of parallelism present in the old CPU).

      Still it's a matter of percentages - if most old code doesn't need to be recompiled for improved performance (e.g. opteron, P4) then that's great. But so far it seems that corollary of EPIC is recompilation is needed.

      No. The whole point of EPIC Vs VLIW is that EPIC does not require recompilation for maximum performance on a new CPU.

      --
      I am TheRaven on Soylent News
    11. Re:Itanium? (somewhat off-topic) by TheLink · · Score: 1

      OK. Now I think I get it.

      Curious tho - why the poorer code density then(compared to say Alpha or x86)? Compilers not good enough? That's going to hurt performance.

      If a bunch of instructions can be executed in any order, and can be variable length, then is it possible to compress them in a packet so they take up less space and bandwidth, and decompress them on-the-fly in the CPU?

      Would probably be easier if they started with dense code, but is code compression possible?

      --
  7. Licensing Issues? by xeon4life · · Score: 5, Interesting

    What will happen to those who must pay a royalty fee per CPU? Will companies that charge for each CPU begin to charge for two, or will it still be viewed as one...?

    --
    Real programmers can write assembly code in any language. -- Larry Wall
    1. Re:Licensing Issues? by Grishnakh · · Score: 1

      I sure hope they charge for each core. It'll help extract more money from stupid customers who refuse to leave vendors that treat them poorly.

    2. Re:Licensing Issues? by Ianoo · · Score: 5, Informative

      When hyperthreading was released, the industry had to cope with similar issues. Those of us using operating systems with artificial limits imposed on the number of possible processors used in a system had to wait for software updates to fix detection. I'm sure that the same thing will happen again, undoutedly there will be some flag in a register somewhere that identifies whether a processor is part of a dual-core chip or just a single CPU on its own. The OS or software can just read this in and work out whether there is sufficient licensing to use them.

    3. Re:Licensing Issues? by dougmc · · Score: 1
      What will happen to those who must pay a royalty fee per CPU?
      You'll have to ask those who charge such a royalty fee, or read through your contract carefully. Having two CPUs in one chip is nothing new (I think there's some IBM and maybe HP boxes using chips like that already), so you should be able to get an answer now -- ask what they're charging the users of those chips.
    4. Re:Licensing Issues? by Wesley+Felter · · Score: 1

      Two cores are two CPUs and have the same performance as two separate CPUs. Thus you will be charged for two CPUs.

    5. Re:Licensing Issues? by elmegil · · Score: 2, Informative

      A typical vendor, Oracle, when talking about a different chip (the newest SPARC chips) says "yes you must pay for each core". I would be surprised if many vendors with such licensing schemes have any other answer.

      --
      7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
    6. Re:Licensing Issues? by dougmc · · Score: 1
      When hyperthreading was released, the industry had to cope with similar issues.
      Not really. Hyperthreading just `sort of' works like another CPU -- it's not really another CPU, and certainly it doesn't perform like a complete other CPU. So they really shouldn't charge extra for it.

      But having two CPUs on one die, that is a second *real* CPU, and therefore something that they could legitimately charge `two CPU' prices for. But even these aren't brand new, so it's not a new question, and it's probably already been answered. My guess is that most vendors charge for the extra cpu, even though both cpus are only in one chip.

    7. Re:Licensing Issues? by name773 · · Score: 5, Funny

      when the wind is blowing westward on odd days of the week you pay for one. when there are clouds on an even day, you pay for two. during leap year, when a west wind blows clouds away at midnight on an even day, you pay for four processors, two computers, a camel, three pci slots, and a partridge in a pear tree.

    8. Re:Licensing Issues? by Anonymous Coward · · Score: 2, Interesting

      The theory behind charging per cpu is that you pay for the value, or at least the work (valuable or not) that the software does. With hyperthreading, it really isn't doing any more work, in theory you could get similar speed-ups (if you are getting any) by improving the memory subsystem, and similar architectural changes for a single-threaded system. So it doesn't make sense to pay a per cpu licensing fee for those "virtual" cpus because they are not actual cpus.

      With a multi-core system, you really do have two independent fully functional processors. So, it would make sense to pay per core because they are actually real cpus.

      Of course the above is predicated on your acceptance that per processor licensing is reasonable in the first place. It is easy to pick holes in, but I don't think you will find the pay-for-real-cpus-but-not-virtual-ones to be out of line with the justifications and rational for paying nu the cpu for anything.

    9. Re:Licensing Issues? by Carnildo · · Score: 1

      But a hyperthreaded CPU reports to the OS as two CPUs, which caused the problems.

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    10. Re:Licensing Issues? by iNiTiUM · · Score: 1

      The general concensus has been that HT does not techincally count as a SMP system since....well since its not. A dual-core system however, actually has 2 logical CPU cores embedded onto one package. Like HP's PA-8800, its literally 2 PA-8700s in one package/

      --
      When encryption is outlawed, ou++1!@(93j++js-d9298yIUH(*Y24JKB!~
    11. Re:Licensing Issues? by Kris_J · · Score: 1

      You're absolutely right. My next PC will be a dual-dual core, with a pair of video cards. Whichever OS supports it best (drivers vs licensing) will be the one I use.

    12. Re:Licensing Issues? by skink1100 · · Score: 1

      Good question -- I'll get on the line with ScoSource and find out right away.

      S

    13. Re:Licensing Issues? by Zeever · · Score: 1

      As far as I know, there's one difference: SMT (hyperthreading) is a way to use more efficiently the resources of a CPU, while Dual Core CPU's are like two real CPU's, with some small differences (cache, for example).

      If that's the case, maybe the companies will license their products as if two CPU's (or one plus fraction) are used.

      --
      -- Who, you?
    14. Re:Licensing Issues? by TheLink · · Score: 2

      "The theory behind charging per cpu is that you pay for the value, or at least the work (valuable or not) that the software does"

      I disagree. The theory behind charging per CPU is much closer to the "how much milk you can squeeze from the cow before you get kicked" theory.

      --
    15. Re:Licensing Issues? by Anonymous Coward · · Score: 0

      ...and of course, the value-of-work theory is how you justify the squeezing to the cow, so it can rationalize its humiliation and won't kick you just yet.

  8. Different core models by SIGALRM · · Score: 5, Informative
    The dual-core chips that Advanced Micro Devices and Intel plan to bring to market next year won't be sharing their memories
    As I understand it, the rationale behind Opteron's "Direct Connect" dual-core architecture is to make it easier to place two processor cores on the same silicon die. It's also a power-consupmtion issue, as the two processors can run at lower clock speeds. However, unlike Intel's design, Direct Connect features an integrated memory controller and hypertransport interconnects that connect the processor to the I/o port or directly to another processor.
    --
    Sigs cause cancer.
    1. Re:Different core models by Proc6 · · Score: 1
      You said "Direct Connect"! You're getting a subpoena buster!

      RIAA

      --

      I'm Rick James with mod points biatch!

  9. "Montecito" by Mateito · · Score: 5, Funny

    "Montecito", a spanish word, literally translates as "a small monte".

    Thus I predict that this will be followed by a quad-core chip called the "monte", an 8-core chip called the "montote" (the big monte), and finally a 16-core chip known as "The Full Monte".

    1. Re:"Montecito" by EvilTwinSkippy · · Score: 1
      and finally a 16-core chip known as "The Full Monte".

      The naked truth, as we know it.

      Though I think Monty Python would be cooler. Though, maybe a bit too constricting.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    2. Re:"Montecito" by laudney · · Score: 1

      Actually this is not just a wild guess. I have inside information that Intel is going to release 64-core CPUs in 5 years. Those multi-core CPUs target three latest emerging workloads: (1) pattern recognition; (2) data mining; (3) data synthesizing. And it's more a software problem than a hardware challenge. Better figure out how to use all those cores in the most efficient way.

    3. Re:"Montecito" by murr · · Score: 2, Funny

      Thus I predict that this will be followed by a quad-core chip called the "monte", an 8-core chip called the "montote" (the big monte), and finally a 16-core chip known as "The Full Monte".

      You forgot to mention the low power edition for portables: The "three core monte".

    4. Re:"Montecito" by shfted! · · Score: 1

      Ahahaha... that was good. Now if only I had mod points ;)

      --
      He who laughs last is stuck in a time dilation bubble.
    5. Re:"Montecito" by name773 · · Score: 1

      well done :) thanks

  10. yeah, by pb · · Score: 4, Interesting

    You probably don't want to have both chips fighting over the cache, and slowing things down; I'm sure doing The Right Thing[tm] will take a while for them to work out. Until then, just pretend that they're mostly separate chips on the same silicon.

    Maybe in the future they'll come up with some more advanced cache designs that can share some cache and improve performance. But until then, expect to see it in the next generation of value chips. (Overclocked dual-core Celerons? Nifty!)

    --
    pb Reply or e-mail; don't vaguely moderate.
    1. Re:yeah, by laudney · · Score: 2, Insightful

      The cause for cache conflict is not a hardware but a software one. Suppose there is one process/thread running on each core. When the two processes have incompatible instruction/data streams that evict each other out of the cache, performance is seriously reduced. This requires an intelligent enough OS scheduler.

    2. Re:yeah, by Anonymous Coward · · Score: 2, Insightful

      You probably don't want to have both chips fighting over the cache, and slowing things down

      As a rule of thumb, if both cores are running threads from the same process (or two processes using shared memory) then shared cache is good because it increases inter-thread bandwidth and decreases inter-thread latency.

      But, if it is just two random processes doing there own thing with little to no interprocess communication, then independent caches are better because you need not worry (as much) about them fighting over the same cache-lines, each mapped to their own different memory spaces. N-way caches help with that kind of problem, and for a large N, might be seen as a good compromise, but the larger the N, the slower and/or more expensive($$$) the cache becomes.

    3. Re:yeah, by drinkypoo · · Score: 1

      It's worth mentioning that since almost nothing is multithreaded, separate caches make much more sense today. Threading is such a pain in the ass that most people just don't bother. Of course that does somewhat limit the usefulness of SMP...

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    4. Re:yeah, by slash-tard · · Score: 1

      Actually you have more threading than you might think, check task manager, OS X activity monitor, or ps and you will see a lot of threads for most processes.

  11. How is this different from a two processor system? by pigscanfly.ca · · Score: 0

    Reading this it sounds like the two cores will be seperate in terms of cahce and internal registers, so I'm wondering how this is differnt from two different processor (except of course the form factor and the higher speed internal bus between the two cores).

  12. You Insensitive Clod by Anonymous Coward · · Score: 0

    I bought my girlfriend has her own dual cache for her dual-core chips last christmas!

    1. Re:You Insensitive Clod by Anonymous Coward · · Score: 0

      Can someone who speaks Idiot please translate for me?

    2. Re:You Insensitive Clod by neuro.slug · · Score: 2, Funny

      Can someone who speaks Idiot please translate for me?

      Translated: Please mod me +5 funny.

      -- n

    3. Re:You Insensitive Clod by Anonymous Coward · · Score: 0

      Maybe it was funny, I just don't understand the language. Is it a sentence of some sort?

  13. Non-news event by doormat · · Score: 3, Informative

    I've saw this article at another website earlier today, and I though this wasnt really important. Each core should have its own cache, thats exactly what a dual core chip is. Not twice as many execution units crammed into the same space, or some other funny configuration, its two seperate chips on the same die, perhaps some modifications for inter-processor communication, but thats about it. With AMD's core design, you have the physical layer only of the hypertransport bus to connect the chips, and the integrated memory controller has one or two ports to talk to memory (single/dual channel) and two ports to talk to two seperate chips. It will be interesting to see if AMD couples dual-core chips with DDR2-667 or DDR2-800, that would make the most sense, as to keep the memory controller from being the bottleneck, as opposed to the system bus on the intel side.

    --
    The Doormat

    If you're not outraged, then you're not paying attention.
    1. Re:Non-news event by endersdouble · · Score: 1
      Not really. From what I understand (IANAcomputer engineer)...

      Sure, sounds better to have, say, 2 FX-51s, each with their own 1 Mb L2 cache, together on the chip. But better yet would be the same two execution units/l1 caches/etc., but ONE 2 Mb l2 cache...so, same memory, each one can still get 1 Mb, but if they need more, they can get more as well.

    2. Re:Non-news event by silas_moeckel · · Score: 2, Informative

      It is better. As another poster pointed out and I'll concure unified cache is better than seperate but a lot harder to make so the first generation dual core chips will not use it. Expect the second generation to have larger unified cache.

      --
      No sir I dont like it.
    3. Re:Non-news event by Tanktalus · · Score: 1

      It's a little more than that.

      Let's say you have three processes running. For simplicity, of course. By "running", I mean that are actually active, not dormant waiting on something. With two CPUs (on a single die or separate), you can only have two processes actually running - even though the third wants to run, it can't yet. The OS steps in, smacks #1, and puts the third process into one of the CPUs, knocking #1 out. Now, if #2 decides it needs to wait (I/O, for example), process #1 can continue one CPU #2. But its data/execution is cached on CPU #1's cache, and so it must reload.

      With a single cache shared, then there's no reload.

    4. Re:Non-news event by Anonymous Coward · · Score: 0

      Really? What happens when I start a write-back from an updated copy of 0x00448124 from cache #1 while reading a (newly-stale) copy of 0x00448124 from cache #2?

      There are a zillion cache consistency/coherency issues to deal with when you have multiple caches. I'm waiting for a comment that actually addresses those issues instead of hand-waving around it.

      Ask the folks at SGI: cache coherency is an extremely difficult problem. A fair number of research papers have been generated about the subject.

    5. Re:Non-news event by doormat · · Score: 1

      The problem with having one shared cache is fighting between processes (assuming two execution units). One can starve the other process from having any data in L2. This can been seen today on P4s with Hyperthreading....

      And yes, I am a computer engineer.

      --
      The Doormat

      If you're not outraged, then you're not paying attention.
    6. Re:Non-news event by Wesley+Felter · · Score: 2, Insightful

      Er, cache coherency works exactly the same way in a multicore chip as in an old-fashioned SMP. Opteron, Xeon, and small Itanic systems use the time-tested broadcast snooping algorithms that are taught in undergraduate courses.

    7. Re:Non-news event by TheLink · · Score: 1

      I'm not a CPU engineer but it may be harder to make a unified cache design where if one of the cores is a dud they can still sell the whole chip as a single core chip.

      --
    8. Re:Non-news event by silas_moeckel · · Score: 1

      It's somewhat more complicated but they have done that for years by seperating the cache into banks and if one bank fails testing they disable it and sell it as a celly. Remember L2 cache is a big part of a modern chip in actual die space. With dual cores they may well be able to scrap a whole proc and sell the remainder as a single proc chip. That might be realy interesting if they unify the cache later on they could sell a single proc chip with the cache of a dual proc as some sort of extream gaming chip for realy no cost to them.

      --
      No sir I dont like it.
  14. Of course! by Anonymous Coward · · Score: 0

    Since when was that NOT the case in the computer world if you wanted to buy something new? Too bad second hand stuff is ridiculously overprized...

  15. Re:How is this different from a two processor syst by hawkbug · · Score: 5, Informative

    It's not much different - that's the point. 2 processors in a single socket, saves a lot of money production wise, and that should pass onto the consumer. AMD has said their's is backward comaptible, and that's huge. You already got a single cpu opteron workstation? Well now you can have a dual cpu one for the price of a single cpu upgrade. That kicks ass.

  16. Inside the dual core by spirit_fingers · · Score: 4, Funny

    Actually, the left core will be verbal, creative and be really good at procesing visual information, while the right core will be logical, good at number crunching and have no style sense whatsoever.

    1. Re:Inside the dual core by name773 · · Score: 1

      this is great for os x users.

  17. Dual core - what's the point? by Ianoo · · Score: 1

    I don't understand all the hype around dual core. Maybe I'm being stupid. Two chips on one core seems like a great idea, and I'm sure it will improve performance.

    But Intel has already demonstrated there is surely a better solution - something like SMT, hyperthreading.

    Wouldn't it be saner to build a chip with double the number of execution units and double the number of instruction fetch/decode units and a larger reorder buffer that would appear, say, as four logical processors to a system? Surely you could get higher utilisation of your arithmetic logic units from such an arrangement than you could with two entirely separate processors?

    Or is the simple advantage with dual core that you don't have to distribute the same clock over the entire silicon die? I know this is becoming a big problem with complex VLSI, and I guess this might be a half-way solution until clockless designs arrive.

    Can anyone "in the know" answer this question?

    1. Re:Dual core - what's the point? by Wesley+Felter · · Score: 1

      Wouldn't it be saner to build a chip with double the number of execution units and double the number of instruction fetch/decode units and a larger reorder buffer that would appear, say, as four logical processors to a system?

      That's like the Alpha EV8. It costs way too much to design and it's questionable whether you could build it at all.

    2. Re:Dual core - what's the point? by NerveGas · · Score: 3, Interesting

      The benefits of HT, as currently implemented, are pretty insignificant compared to the benefits of multiprocessing, as the possible performance boost is very small, it certainly doesn't give you the ability to handle more interrupts, and it doesn't let you decrease the number of context-switches.

      As for building a more intelligent core to take advantage of the extra transistors, that just might make sense - but it would also take hundreds of millions (or billions) of dollars in development, and the chip wouldn't appear for a good number of years (look at the Itanium). It's a lot easier and cheaper to slap two cores on the same die and call it done. Because Intel is scurrying to try and play catch-up to AMD in the high-end market, time-to-market is critical for them.

      steve

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
    3. Re:Dual core - what's the point? by norkakn · · Score: 1

      Well, they are skipping on of the main ones.

      One of the costliest things is a cache miss, and if one were able to share the caches between two cores it would greatly decrease the number of misses. (no need to have everything in their twice)

    4. Re:Dual core - what's the point? by laudney · · Score: 1

      First, when you double execution units and instruction fetch/decode units and reorder buffers, you are creating two cores! Logical processors share all the execution components but each with a copy of general registers etc. Second, SMT and CMP (chip multiprocessor) aim to solve different problems. SMT is to overcome the memory access latency. When one logical processor is idling, the other can jump in and make use of the executino unit. CMP is to bringing extra computing power onto the chip without exponentially increasing power comsuption and die size and transistor numbers and design/debug complexity. Third, clock is really a big problem. At GHz, clock signal cannot travel across a 20mm die. As a result, we either add several low clockrate cores onto the die or choose asynchronous chips which are a hell difficult to design and verify!!

    5. Re:Dual core - what's the point? by ArbitraryConstant · · Score: 2, Informative

      Hyperthreading is not a better solution, particularly when dealing with the Intel implementation. Unless it's very carefully done, all it does is keep the cache from working effectively. Linux and FreeBSD actually got performance improvements from leaving one of the virtual processors idle when there were more processes scheduled to run. When there's two threads of the same process, they let them both run because those tend to have better locality of reference and therefore don't thrash the cache so much.

      Processor designers are in a different situation now than 10 years ago. They've got more transistors than they know what to do with, so adding cache and adding another core are cheap. Streamlining one core to run faster is much harder, as evidenced by Intel's unending troubles with anything faster than 3.2 ghz.

      --
      I rarely criticize things I don't care about.
    6. Re:Dual core - what's the point? by drinkypoo · · Score: 4, Informative

      Hyperthreading is simply a second context. It lets you run a second thread at the same time by using the unutilized capacity of existing functional units and is largely useful only when intel's branch prediction fails and the chip would otherwise be paying the ultimate penalty for its long, long, LONG pipeline.

      In other words, HT is an ingenious method for making up for the fact that the pentium 4 is horribly inefficient.

      It would be better to stick a whole bunch of simple cores on a single chip at a lower clock rate and have them work cooperatively, if only we used more multithreading. This is pretty much where intel is planning to go, with their multiple-core chips based on the Pentium-M. Or, so the rumors say.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    7. Re:Dual core - what's the point? by Graelin · · Score: 1

      Because Intel is scurrying to try and play catch-up to AMD in the high-end market, time-to-market is critical for them.

      I should point out that a VAST majority of AMD64/Opteron deployments are anything but "high-end." Where as most of the Itanium installations I've seen or heard of are of the >64 processor variety and most certainly qualify for the high-end badge.

      You're statement would only be slightly more accurate if you had used "mid-range market" instead. But even there, Intel's saving grace has been a lack of a [stable / production ready] 64bit Windows platform.

    8. Re:Dual core - what's the point? by Anonymous Coward · · Score: 0
      Wouldn't it be saner to build a chip with double the number of execution units and double the number of instruction fetch/decode units and a larger reorder buffer that would appear, say, as four logical processors to a system? Surely you could get higher utilisation of your arithmetic logic units from such an arrangement than you could with two entirely separate processors?

      Oh sure, that sounds like a good idea. Then when one thread stalls the pipeline, the other thread can sit around wait doing nothing for no reason at all, because they both stall at once because two unrelated threads are sharing the same pipeline.

      Or perhaps you have some kind of new great idea where you can reorder the pieces within the pipeline magically so that pipeline stalls don't occur anymore. If so, please share this idea with the world, because it sure would be nice to have...

    9. Re:Dual core - what's the point? by Anonymous Coward · · Score: 0

      I wonder how different the outlook of the lay technology commentators would be, if Compaq hadn't pulled the plug on Alpha EV8, which would have been SMP done right instead of Intel's "hyperthreading" kludge. Would we be seeing everyone and their dog clamoring for more SMP processors with wider pipelines?

    10. Re:Dual core - what's the point? by Moraelin · · Score: 1

      " First, when you double execution units and instruction fetch/decode units and reorder buffers, you are creating two cores!"

      You know, the funny thing is that you may be the only one who at least partially understood what he was really saying. Yet you still seem to miss the big picture.

      Basically, yes, it would be like two cores, excapt they allocate resources from a common pool. As opposed to the current easy way out of just glueing two completely separate cores (even with separate cache!).

      For example, instead of having 2 hypothetical cores with 2 integer units each, you could have 2 which have a common pool of 4 integer units. Maybe at one point one of the cores can only schedule one integer operation at a time, but the other can use three in parallel.

      On the other hand, technically speaking it's not really multi-core, it's still SMP. It's just SMP with twice as many resources available.

      Seems to me like a more efficient use of silicon any way you want to look at it. Very expensive to design too, but nevertheless more efficient.

      Basically you can think of it this way: which is more efficient? Let's say a construction company has 2 teams and 4 trucks. Is it more efficient to have 2 trucks exclusive to each team (current dual core), or a common pool of 4 trucks allocated to the teams as needed (SMP with twice the resources)? What happens if one day team A doesn't need any truck, while team B would sorely need all four?

      Notice how sharing resources could be more efficient than not sharing anything?

      --
      A polar bear is a cartesian bear after a coordinate transform.
    11. Re:Dual core - what's the point? by flaming-opus · · Score: 1

      Ummmm. Opteron does well in the high-end workstation market. Maybe the high-end gamer market. In terms of server technology it's still a bit of a joke. The biggest boxes at 4-ways. None of them support hardware partitioning, most don't even do chip-kill memory.

      Sun has promised to make 8-way and larger opteron systems, but don't expect them for a couple years. Opterons don't really compete with Itanium, they compete with Xeon. (very well I might add) In the real server world Itanium is trying to break into the territory of sparc, power, pa-risc, and alpha. Opteron doesn't even exist in that space.

    12. Re:Dual core - what's the point? by flaming-opus · · Score: 1

      It should be noted that processors with shorted pipelines (POWER5 for example) also benefit from multiple threads. Prescott pays a BIGGER penalty fro m a branch mispredict, but all CPUs pay a penalty. All CPUs pay a HUGE penalty if you need to reference main memory. I would say it this way:

      In other words, SMT is an ingenious method for making up for the fact that CPUs are horribly inefficient.

      This big pile of simple cores idea is the premise behind IBM's Blue Gene. (or the thinking machines from the early 90s.) It works some of the time, but is extremely difficult to program.

    13. Re:Dual core - what's the point? by NerveGas · · Score: 1

      Ummmm. Opteron does well in the high-end workstation market. Maybe the high-end gamer market. In terms of server technology it's still a bit of a joke. The biggest boxes at 4-ways.

      Sun has promised to make 8-way and larger opteron systems, but don't expect them for a couple years.

      AMD has done some leaning on the manufacturers, and you should see 8-way Opterons by the end of this year.

      None of them support hardware partitioning, most don't even do chip-kill memory.

      For every server sold that does have those, a hundred thousand are sold which don't.

      Opterons don't really compete with Itanium, they compete with Xeon

      AMD sold more Opterons in the first year of launch than Intel sold Itaniums in the several years since then. Opterons are outselling Itaniums left and right, and there are quite a few shops buying Opterons instead of Itaniums.

      In the real server world Itanium is trying to break into the territory of sparc, power, pa-risc, and alpha. Opteron doesn't even exist in that space.

      Funny, Opterons seem to be eating into the smaller Sparc boxes pretty well. Sun has certainly seen that. You're right, you won't see a hardware-partitioned 128-way Opteron any time soon. But it's certainly starting to eat into the server market. If you went back in time four years, and someone told you that you'd be able to buy an 8-way, 64-bit, embedded memory controller, high-bandwidth interconnect server as a commodity product, you'd think that they were insane.

      steve

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
    14. Re:Dual core - what's the point? by NerveGas · · Score: 1

      here

      You'll be able to buy a 4-way Opteron system, use it for a while, then tie it into another 4-way Opteron later. And another, and another, up to 8 boxes with 4 CPU's each, or 64 CPUs if you're using the upcoming dual-core chips.

      A 32- or 64-way system, with 32 128-bit DDR400 memory controllers (total aggregate memory bandwidth: 102 GB/s) with hardware partitioning, hot-plug connect/disconnect between systems, and even up to 64 megs of RDC cache between each link.

      And, even better than the current offerings: If you think you might need 32 chips in a machine from one of the big names, then you buy a 32-way chassis/backplane to start with. These let you buy only what you need.

      There are still some features that are found in the true "big iron" that aren't found in this platform, and some that will never be found in the Opteron platform. But with this level of advancement, you just can't tell me that the Opterons aren't eating in to the "high-end" server market.

      steve

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
  18. It will be interesting by antifoidulus · · Score: 1

    to see if they can market this as a consumer machine someday. As long as windows isn't bogged down by spyware, a $500 dell machine can browse the internet, play music, and have a document or 2 open in Word without slowing down considerably.
    It's obvious they will be helpful in games(even if games don't take advantage of the dual core, having all your OS threads running seperately will help) and scientific computing, but it seems to me that small of an audience makes it harder and harder to rationalize spending big money on chip R&D. Now those markets will always have an unquenchable thirst for power, but most of the cost for chips I think would be fixed costs(R&D, fabs etc), thus you can only get cheap if you go in volume....



    Oh wait, Longhorn...damn, nevermind the above post.

  19. ?Piensas que soy tonto o que? by GuyFawkes · · Score: 1

    as subject

    --
    http://slashdot.org/~GuyFawkes/journal
  20. AMD seems more promising by leathered · · Score: 3, Informative

    Luckily for AMD, the Opteron/A64 was designed with dual-core in mind. As I understand it both cores will talk to each other via an internal Hypertransport link and (as with current Opertons) together with the internal memory controller will eliminate the need for an external northbridge. It is also expected that upon release they will drop directly into existing motherboards with nothing more than a BIOS upgrade.

    Intel will find things more challenging. Both cores will have to contend the GTL bus, currently the Achilles heel of their MP solutions, by communicating via an external northbridge.

    --
    For all intensive porpoises your a bunch of rediculous loosers
    1. Re:AMD seems more promising by puppetman · · Score: 1

      Yah, and Intel is doing dual core on the Itanium. I don't know many people that shell out for such an exotic CPU at home; I do know people that bought an Opteron for home, however.

      In essence, it sounds like the AMD dual core technology will be introduced so that it is available for consumers (at the high end), where Intel will introduce theirs for corporations looking to buy servers.

    2. Re:AMD seems more promising by Aadain2001 · · Score: 1

      Um, why would they have to talk through the northbridge if they are physically in the same silicon? That's the whole point behind dual core chips: eliminate the communications issues found it current MP systems. If they are physically next to each other, why would they even think of going off chip to talk to its neighbor????????

      --
      Space for rent, inquire within
    3. Re:AMD seems more promising by drinkypoo · · Score: 1

      It's likely that intel will move some of the north bridge functionality directly onto the chip, but that will likely necessitate a new chipset. However, motherboards are usually the smallest part of the price of the system (except for the case, floppy drive, or maybe an optical drive) and as such are usually pretty replacable.

      Probably the people most interested in reusing their old board with these new dual-core chips will be people with dual-processor boards which were expensive to begin with; They will be able to upgrade to 4-way SMP without replacing their mainboard. No one else is going to care much.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    4. Re:AMD seems more promising by agent+dero · · Score: 1

      Luckily for AMD, the Opteron/A64 was designed with dual-core in mind

      Not quite lucky, more, logically.
      AMD designed their Opterons with the long-term inmind, that's why they can run 32bit apps in the status quo. They designed it with dual cores in mind, because they knew where they wanted to go with the chipset.

      Logic > Luck ;)

      --
      Error 407 - No creative sig found
  21. Pipline Depth and Complexity in general by rarose · · Score: 1

    If you add an extra execution unit to a CPU, you have to add all sorts of logic to decide what is pairable with what else (i.e. allocation of execution units such that they don't collide in terms of input or output registers).

    At somepoint you reach a limit where you can't use extra execution units because you don't know the input values to an instruction because the previous instructions upon which it depends are still in the pipline of other execution units.

    Dual core avoids that... plus if you validate one core, you can cookie-cutter another one in and have minimal new validation. Especially if the communication between cores is a HT link just like what you're using in a single core design to talk to the outside world already.

    --
    --Rob
  22. Commodity hardware grows mature. by Skulker303 · · Score: 5, Insightful

    Daul core microprocessors are not a new development. IBM with their POWER4 and POWER5, HP and the PA-RISC 8800, and TI with their OMAP processors are definitive proof that multi-core solutions are not just a stop gap in increasing the performance delta of modern silicon.

    Daul core processors are a natural evolution in the development of general purpose and even specialized computing devices. SMT was to be a boon for the EV8, but later found its way into the Pentium4. Multiple logical processors were just a first step.

    It should be interesting to see just what AMD can do with both SMT and a daul core design.

    It just had better run BSD. = )

    1. Re:Commodity hardware grows mature. by Anonymous Coward · · Score: 1, Informative

      Please. Dual.

    2. Re:Commodity hardware grows mature. by JollyFinn · · Score: 1

      IBM had dualcore before POWER4. It was in G6. [Not PPC G6 but S390/G6] And that was long before Power4. And I wouldn't be surpriced if some one did it earlier.

      --
      Emacs is good operating system, but it has one flaw: Its text editor could be better.
    3. Re:Commodity hardware grows mature. by Jhan · · Score: 2, Insightful

      Daul core microprocessors are not...
      Daul core processors are...
      It should be interesting to see just what AMD can do with both SMT and a daul core design.

      You keep using that word. I do not think it means what you think it means.

      --

      I choose to remain celibate, like my father and his father before him.

  23. Single is better! by cowboynealisfat · · Score: 0, Redundant

    In case it's not obvious to those who didn't read the article all the way through, it's a better thing when the memory is shared (single cache) rather than separate (dual cache). But that is harder to design, so for these first-generation dual-core chips from Intel and AMD, they are using separate caches for each core. (IBM's dual core Power4 [ibm.com] processor has a unified cache.) At some point down the road, they will most likely unify them to increase performance.

  24. coming this fall on Fox... by rarose · · Score: 5, Funny

    It's "RISC CPI for the CISC guy"

    I can't wait to see what they do to his nonorthogonal register file.

    --
    --Rob
    1. Re:coming this fall on Fox... by Frogbert · · Score: 1

      Somewhere out there there are four electronic engineers laughing their arses off and about a million other slashdotters scratching their heads.

  25. The down-side to this.... by NerveGas · · Score: 4, Informative


    The downside is that as the AMD chips are going to be backward-compatible with older boards, I imagine that the dual-core chip will still only have the single 128-bit memory controller.

    While that will still give you twice as many available CPU iterations, that means that the two cores will be fighting for memory bandwidth. In the case of Intel's chips, that's business-as-usual: But for the Opterons, where each processor brings its own memory controller, it just doesn't feel right. : (

    steve

    --
    Oh, you're not stuck, you're just unable to let go of the onion rings.
    1. Re:The down-side to this.... by drinkypoo · · Score: 1

      From what I understand they will have a dual-core chip in which one processor is connected to memory via a dual channel memory controller, and the other processor is connected only to the first processor, via hypertransport.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:The down-side to this.... by hattig · · Score: 1

      I imagine that the dual-core chip will still only have the single 128-bit memory controller.

      That is very true. Still, there are some dual-Opteron motherboards with only one processor wired to memory and they perform pretty well still so it might not be that much of a problem.

      Maybe AMD will have DDR2-800 support in the processor when it is launched next year. There are suggestions that the current S940 / S939 supports DDR2 if it was wired up, but it is disabled/notdone in current AMD64 processors. AMD will have to move to DDR2 at some point, when the price has dropped to reasonable. Hell, even dual-channel DDR2-666 will be more than enough bandwidth for a dual-core Opteron, and Intel's i925 supports this already (well, not officially but many websites have enabled it and shown it to work).

      DDR2 does have higher latencies though, and Opteron/A64 does like low latencies ... hopefully at 666 or 800 speeds this hit will be reduced, and Intel will still have the FSB latency issue as well.

    3. Re:The down-side to this.... by Wesley+Felter · · Score: 1

      No, both cores are connected to a switch which is connected to the memory controller.

    4. Re:The down-side to this.... by drinkypoo · · Score: 1

      That makes no sense whatsoever. Who told you that? I've specifically read that it will be done in the way that I describe, though I can't remember where so I suppose it's not really useful information. Regardless, Hammer-core processors have an internal memory controller, so it makes no sense whatsoever for the cores to be connected to some kind of crossbar controller. It only makes sense either to run a single-channel memory controller out from each processor, in order for them to drive alternating DIMMs, or to run a dual-channel out from one core, and connect the cores via HT, which is a perfectly allowable configuration for these processors.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    5. Re:The down-side to this.... by glsunder · · Score: 1

      I read somewhere, I think it was ace's hardware, that AMD claims that the bandwidth sharing will be made up for by the faster cache snooping.

    6. Re:The down-side to this.... by Wesley+Felter · · Score: 1

      Almost every Opteron block diagram shows the crossbar switch. For example, AMD's 8th-Generation Processor Architecture document from 2001 shows how the SRQ and crossbar are used in a dual-core Opteron (figure 5).

    7. Re:The down-side to this.... by Lonewolf666 · · Score: 1

      The total result might be similar to two processors having a 64bit memory controller each.
      If you look at the P-Ratings of socket 754 versus socket 939, 64bit memory controller vs. 128bit seems to make roughly a 10% difference in performance.
      Thus, I would expect a dual-core Opteron to be slightly slower than a two processor design where each CPU has its own memory.

      --
      C - the footgun of programming languages
  26. Sure, OS/400 by Shivetya · · Score: 4, Insightful

    Been that way for many years. Is rock stable and secure.

    Granted it is on a mini, but we have enjoyed 64bit computing for nearly nearly 10 years. Even have some power5s in production.

    There are great OSes other than the ones used on PC hardware... too many "geeks" forget that.

    --
    * Winners compare their achievements to their goals, losers compare theirs to that of others.
    1. Re:Sure, OS/400 by asdfghjklqwertyuiop · · Score: 1

      I think by "no not linux" he actually meant "an OS I can play all my video games on, ie windows".

    2. Re:Sure, OS/400 by ameoba · · Score: 1

      This may be the first time I've ever heard anything positive about OS/400 or the AS/400 hardware.

      Well, other than "It just works. And keeps working".

      --
      my sig's at the bottom of the page.
    3. Re:Sure, OS/400 by Jozer99 · · Score: 1

      Yeah, whats the deal? I have a 10 or 11 year old DEC Alpha running at 333 Mhz in my office. 64 bit hardware (many others besides the Alpha 21164 chips) and software (many besides True64 Unix) has been around. Take a look.

  27. So can it crash twice now ? by mcraig · · Score: 5, Funny

    Kernel Panic Core Dumped... Still Panicking Dumping Second Core...

    1. Re:So can it crash twice now ? by Kranium · · Score: 2, Interesting

      You laugh.. but that's exactly what I've seen happen on Mac OS.. One processor panics, but you're still dragging a window around because one is still running.. (That doesn't last long, though..)

  28. Ultimately not a benefit at all by milktoastman · · Score: 1

    A dual cache will NEVER fix the usual double cell problem that often comes up, right?

  29. It's nice but.. by leathered · · Score: 4, Insightful

    I like Itanium. It's a pretty neat architecture which crushes most before it in FP intensive tasks. It is clear why it has done well in HPC. But HPC is nothing more than a niche.

    Now here are the problems:

    32 bit (x86) perfomance sucks. All those apps you've spent years developing will need re-writing (A simple recompile is often out of the question).

    HP (in collusion with Intel) killed perfectly good archs. in Alpha and PA-RISC in an effort to get people to migrate to IA-64. A few may have made the move but this has mostly served to push people towards the vasty cheaper x86. HP, and to a lesser extent Intel, should provide what their customers want, not what they think is best for them.

    It still uses a shared bus architecture. There are diminishing returns as you add more processors.

    Itanium requires massive caches to get the best from it. Cache = Silicon = Cost. It is clear that a large scale seeding exercise is still underway with Itanium systems being provided at or below cost. Looks like it will be a long time before there will be any return on the billions invested in Itanium.

    --
    For all intensive porpoises your a bunch of rediculous loosers
    1. Re:It's nice but.. by Anonymous Coward · · Score: 1, Interesting

      That's very interesting: you just said cache = silicon = cost.

      But this is oversimplifying things, and that is, IMO, one of the key insights into why the itanium architecture will prove superior and win in the end.

      "Cost" means many different things. There's a cost to _manufacture_ the silicon, but there are also costs associated with designing the silicon, and costs associated with "running" the silicon (I mean power consumption, heat dissipation etc).

      The great thing about cache is that the latter two costs are almost zero - the design cost is basically nothing (you cut and paste, quite literally) and the power consumption is very low because only a tiny fraction of the transistors in cache actually switch value in any given clock cycle. Yes, there's static leakage power to worry about as well, but with things like adaptive voltage biasing Intel (and others) are well on their way to getting a hold of leakage as well.

      The Itanium architecture is designed to shine in the era of billion, and indeed 10-billion transistor CPUs, which is coming very soon (the next two to ten years, basically).

      There's no way Intel (or anyone else) can even afford to hire 50 times as many design engineers as they to currently, so they have to think about more clever ways of spending their transistor budget.

      There are two ways,really: more cache, and more execution resources. You can cut and paste cache, and you can cut and paste things like ALUs and FPUs. Of course, you would need a wide-issue instruction architecture to take advantage of all of these. ...welcome to Itanium.

    2. Re:It's nice but.. by TheLink · · Score: 1

      "I like Itanium. It's a pretty neat architecture which crushes most before it in FP intensive tasks. It is clear why it has done well in HPC. But HPC is nothing more than a niche."

      Getting a processor with "great FP performance IF you have a smart compiler" is actually relatively easy.

      Just have more FPUs. Look at the video card GPUs which do something similar.

      It's whether it's worth the silicon for your purpose. The target market for x86 doesn't need FP performance _that_ much, so you don't get as many FP units - the silicon goes to something else.

      --
    3. Re:It's nice but.. by flaming-opus · · Score: 1

      Not Yet they haven't. They just released the EV7z on the alpha front, and will soon release the PA-890. HP is planning to support VMS and Tru64 on alpha, hpux on pa, and tandemOS on mips until 2008 at least.

      Itaniums sometimes use a shared bus architecture, but the fancier HP and SGI boxes use NUMA-style crossbars instead. Remember that crossbars scale bandwidth better, but there's a latency penalty. Note that all the big iron boxes are crossbars connecting a bunch of nodes. Those nodes all use 2 or 4 processor shared-bus designs. (sun, ibm, unisys, fujitsu, hp, etc)

      HP made a calculated decission. It would be VERY expensive for them to continue development of their 5 distinct server lines. Some customers will jump ship to IBM or SUN, but that's something they can't really help. The server business is becoming a mature commodity business. How many companies manufacture airplanes? Hw many build cars. Most mature markets consolidate down to a half dozen big players. Some things suck along the way, but it should not be a shock to anyone.

  30. Ewww... by rarose · · Score: 1

    Can we get a courtesy (cache) flush?

    --
    --Rob
  31. IBM working on Dual Core 970MP too! by Anonymous Coward · · Score: 0

    And of course you can have a full 64 bit OS using Mac OS X "Tiger".

    http://www.thinksecret.com/news/antares.html

  32. Sync? by TwistedSpring · · Score: 1

    What about cache sync? Educate me here but I would have thought that a double-sized shared cache would be faster than two seperate caches that have to be synced all the time. Am I an idiot?

    1. Re:Sync? by BrewerDude · · Score: 1
      The caches don't have to be perfectly synced. They just have to snoop each others transactions so that sharing can be arbitrated as necessary.

      So, if you have an application in which there is very little or no sharing across the processors, then there is little or no "syncing" involved. In that case, the dual-cache design probably wins over the shared cache in terms of design complexity and perhaps latency to access the cache, depending on how it's done.

      Also, separate caches that mantain coherency with each other will scale better to more processor cores than a single shared cache will.

    2. Re:Sync? by pclminion · · Score: 1

      The idea of multiprocessing is to have two CPUs working on two parts of a problem at once. If that processing requires a lot of cache synchronization, things are badly broken anyway: you'll have so much locking overhead maintaining consistency at the application level that any performance impact due to cache synchronization would be unnoticeable.

  33. what's the diff: dual core and hyperthreading? by Locutus · · Score: 2, Interesting

    A friend purchased a 3GHz( yes 3 ) Intel Pentium 4 with HyperThreading a few months back. I asked why he didn't purchase an AMD CPU and he said he needed x86 compatibility... So much for informed hardware engeers. Anyway, I recently asked him about the system since I just built an AMD 2600+ based system and wanted to know if he had some code he wanted to compare/test. Well, he told me that his 3GHz CPU really only runs most applications at 1.5GHz except if they are multi-threaded or hyperthread aware.

    Is this true? Does Intel put a 3GHz label on 1.5GHz dual/core CPU's or whatever this hyperthreading is? Sounds dual/core-ish to me...

    It's funny how that 1.5GHz number shows up again in Intel product. I remember when they could not build anything faster than 7xxMHz and then all of a sudden, they had a "new technology" that got them 1.5GHz( 2x 750MHz ) and it was found out later that only PART of the CPU was running at 2x. This all happened when AMD beat Intel passed the 1GHz barrier. Are they again playing "tricks" to get a big GHz label on their parts?

    So any of you people up on this dual-core and hyperthreading thing and feel like explaining to the rest of us what's going on? TIA.

    LoB

    --
    "Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
    1. Re:what's the diff: dual core and hyperthreading? by 0123456 · · Score: 1

      "Is this true?" Uh, no.

    2. Re:what's the diff: dual core and hyperthreading? by ArbitraryConstant · · Score: 1

      "Is this true? Does Intel put a 3GHz label on 1.5GHz dual/core CPU's"

      No. One core running at 3 ghz working with two different streams of instructions so it looks like two processors. They must contend for resources, so each is slower by itself than the single stream you have with hyperthreading disabled, but together they get more work done overall when working on some tasks . Sometimes hyperthreading makes things worse.

      Dual core is a completely different idea. Basically, they print two processors onto the same hunk of silicon and it's cheaper to manufacture than two seperate processors because all the other costs like packaging it and testing it stay the same.

      --
      I rarely criticize things I don't care about.
    3. Re:what's the diff: dual core and hyperthreading? by owlstead · · Score: 1

      No, they (Intel processors) do run at full speed. However, with hyperthreading both virtual processors use the same infrastructure. Since they can both run processor threads at the same time, that infrastructure is used more efficiently though, so you get (slightly) better performance. Say up to 10/20 percent or so.

      Obviously, with dual cores you get a 100% upgrade at the same clockspeed. It depends on the intercommunication between the processor threads if your system will indeed perform twice as fast. If you could run folding at home on each CPU, then they would get twice the results, since that application does not need a fast path from one processor to another.

      That is the problem with seperate caches. Multiple processor threads of the same application probably wont perform that well on these designs. The communication between both processors is hampered, and they will probably have to fill both L1 caches with the same data to run the same process. They will probably use a higher level cache to make up for that though.

      ps. just my basic knowledge of processor design guys, feel free to shoot as many holes in this story as you can.

    4. Re:what's the diff: dual core and hyperthreading? by Paladine97 · · Score: 1

      Is this true? Does Intel put a 3GHz label on 1.5GHz dual/core CPU's or whatever this hyperthreading is? Sounds dual/core-ish to me...

      No. As you said yourself, your friend is a moron.

    5. Re:what's the diff: dual core and hyperthreading? by Locutus · · Score: 1

      thanks. I see there are other comments on this now so it's a bit clearer now. thanks.

      LoB

      --
      "Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
    6. Re:what's the diff: dual core and hyperthreading? by Anonymous Coward · · Score: 0

      Your friend, the so called hardware engineer, is the biggest idiot on the planet. Please google hyperthreading to figure out why on your own.

    7. Re:what's the diff: dual core and hyperthreading? by Anonymous Coward · · Score: 0
      Excellent troll!

      I suspect you work for Apple Computer and you're one of their paid astroturfers who go to slashdot and post misinformation.

      You'll just have to do a little better than this to get anyone to believe you!

    8. Re:what's the diff: dual core and hyperthreading? by Anonymous Coward · · Score: 0

      The rated clockrate denotes the maximum frequency on the processor's CLK pin. The motherboard controls this frequency. Your friend may have been talking about how the arithmetic pipelines do run at double the external clock rate in P4s. This is way they can recover from branch mispredictions so quickly.

    9. Re:what's the diff: dual core and hyperthreading? by TheLink · · Score: 1

      By the way, I've got a special thermobridging system for your friend that will help his CPU run at 3GHz, only USD599.

      If he's interested, let me know ASAP.

      --
    10. Re:what's the diff: dual core and hyperthreading? by RockyMountain · · Score: 1

      Basically, they print two processors onto the same hunk of silicon and it's cheaper to manufacture than two seperate processors because all the other costs like packaging it and testing it stay the same.

      Packaging and testing, sure. But overall cost of 1-dual vs 2-single isn't as clear. Big die are expensive -- they require more costly fab techniques, and result in low yield. Beyond a certain size, the loss in yield is just huge.

      You can partially recover the lost yield by salvaging some of the failing dual-core die to sell as single-core parts. There are limits to this, though. They make lousy single-core parts for many reasons, including very high leakage power and a larger die that you have to package. To be viable, you need a high dual-core yield.

      Bottom line: for equivalent complexity and cache size, I seriously doubt that it's any cheaper to produce one dual-core chip compared to two single-core ones, knowing how sensitive IC economics are to yield.

    11. Re:what's the diff: dual core and hyperthreading? by ArbitraryConstant · · Score: 1

      "Packaging and testing, sure. But overall cost of 1-dual vs 2-single isn't as clear. Big die are expensive -- they require more costly fab techniques, and result in low yield. Beyond a certain size, the loss in yield is just huge."

      Correct.

      Amazing, then, that Intel and AMD are talking about going dual core as they start producing chips with 90 nm and smaller technology. It's almost as if they've reached some kind of break even point, where the probability of a defect on a dual core die falls to the point where the gains outweigh the losses.

      "Bottom line: for equivalent complexity and cache size, I seriously doubt that it's any cheaper to produce one dual-core chip compared to two single-core ones, knowing how sensitive IC economics are to yield."

      You're wrong.

      --
      I rarely criticize things I don't care about.
    12. Re:what's the diff: dual core and hyperthreading? by RockyMountain · · Score: 2, Insightful

      It's almost as if they've reached some kind of break even point, where the probability of a defect on a dual core die falls to the point where the gains outweigh the losses.

      The gains definitely outweigh the losses, or they wouldn't do it. But the gains don't only come from CPU cost-per-core. There are lots of other factors, such as density, power efficiency, potential for core-to-core lockstep, etc.

      I have no first-hand knowledge of AMD, but for Itanium, smaller process geometries do not increase yield through smaller die size. As they've shrunk to smaller geometries, they have not shrunk die size at all. All the extra real estate goes into larger caches, and the die size, and thus the (raw) yield, remains about the same. They have improved yield, but it's not through shrinking the die.

      They have dramatically improved yield in other ways. As your execution units shrink and cache dominates an ever-increasing percentage of the die area, it becomes easier to use redundancy to make the chip tolerant of defects.

      Intel calls it Pellston Technology (I hate marketing speak). And it it this, more than anything, that makes such massive die as Montecito even possible. In the old days, one defect trashed the die. With this sort of technology, most defects are worked around through redundancy. And, if you have too many defects to allow that, you may still salvage the die by selling it at a lower price with a reduced-capacity caches. Most chips shipped to customers have several completely corrected defects.

      You're wrong.

      Probably. Wouldn't be the first time.

      If you take into account the overall _system_ cost, dual-core is definitely far cheaper than dual-socket. System cost also includes cost of board area, power delivery, cooling, etc. OEMs will happily pay well over double for a 2-core vs 1-core because of the savings they will make elsewhere in the system. Dual-core also gives system vendors much more flexibility by allowing the same board design to support twice as wide a range of CPU counts.

      I was comparing CPU+package cost only. Having been both a CPU designer and a system designer at different times in my career, I know how to look at it both ways.

    13. Re:what's the diff: dual core and hyperthreading? by ArbitraryConstant · · Score: 1

      "I have no first-hand knowledge of AMD, but for Itanium, smaller process geometries do not increase yield through smaller die size. As they've shrunk to smaller geometries, they have not shrunk die size at all. All the extra real estate goes into larger caches, and the die size, and thus the (raw) yield, remains about the same. They have improved yield, but it's not through shrinking the die."

      Right, but there comes a point where "more cache more cache!" just doesn't improve your performance anymore. The trivial case is where cache size equals memory size, and your hit probability is 1.

      Note that cache sizes have fluctuated around 256-512 kb since the P2 days. My P2 and P4 both have 512 kb. I'd be shocked if the reason was something other than that being a sweet spot. They're moving to 1 mb now, but that just means they've got more transistors and they don't have worthwhile uses for them in the core. It's diminishing returns, but there's still returns.

      But if another core can increase instruction throughput nearly twofold while completely bypassing hard problems like increasing the parallelism in one core, and without increasing die size too much because of the new smaller process, then it makes sense. Itanium is vastly better at extracting parallelism from its instructions because that was the primary design goal, so another core takes more for it to be worth it, and additional cache makes sense.

      If x86 stays in the sweek spot at 512 kb but doubles the instruction throughput with two cores, that's a big win.

      Of course, I'm no more thrilled with a 200 watt CPU than anyone else, but that's what you get with a two CPU system anyway.

      "OEMs will happily pay well over double for a 2-core vs 1-core because of the savings they will make elsewhere in the system. Dual-core also gives system vendors much more flexibility by allowing the same board design to support twice as wide a range of CPU counts."

      It would make me happy building my own system too. :)

      --
      I rarely criticize things I don't care about.
    14. Re:what's the diff: dual core and hyperthreading? by RockyMountain · · Score: 2, Informative

      Note that cache sizes have fluctuated around 256-512 kb since the P2 days. My P2 and P4 both have 512 kb. I'd be shocked if the reason was something other than that being a sweet spot.

      Sorry, I live in a 64-bit world, to the point that I'm quite ignorant of X86 state of the art. I've been blindly (and wrongly) assuming a 64-bit context for this whole conversation.

      Your posting reminded me that caches of only 512M still exist! Montecito has 24M between 2 cores. Also, re-reading your posts in the context of 32-bit systems, they now make much more sense to me. X86 die aren't the same huge monsters that I'm used to. No wonder you and I have different views about yield cost tradeoffs -- we live on different parts of the curve.

      Unfortunately, most of what I know about Itanium I really can't talk about, other than in very general terms -- assuming I wish to remain employed, that is. :)

      The "sweet spot" for cache size is really determined by a race between core performance, and memory latency/bandwidth. Doubling cores doubles the data production/consumption rate. Doubling the frequency also does. The former is less demanding on memory latency and mostly requires more bandwidth. The latter is equally demanding on both. If you double the data production/consumption, and keep the same memory/bus bandwidth, ideally, you'd like to halve the cache miss rate -- but that's pretty unlikely in practice. That's why caches keep growing (at least in the 64-bit world). There is a point of diminuishing returns, because there's an upper limit to both temporal and spacial locality, but we're not quite there yet for Itanium.

      When more CPUs start to have integrated memory controllers and point-to-point links instead of multi-drop busses, I predict that cache sizes per core will actually decline for a while, since the memory performance side of the balance will lower the "sweet spot". After that, caches will probably creep slowly upwards again, because no memory or interconnect technology ever scales fast enough to keep up with CPU core performance scaling.

      So the $64 question is... When cache sizes per core start to decline or level off, will we see smaller die, or will we see even more cores per die? The way Intel seems to always position Itanium for high-end heavy metal, I expect huge die with more cores, although for X86 they went the other way. Or so I infer from your posting.

      Of course, I'm no more thrilled with a 200 watt CPU than anyone else, but that's what you get with a two CPU system anyway.

      Not sure which CPU you're referring to here. Is it something in the X86 line? If you're taking a single core power and multiplying by two, then you may be very pleasantly surprised. Montecito, despite having two cores, will have significantly lower total power than its single-core predecessors. (That much I can safely reveal, because it seems to be common knowledge already.)

      Having designed systems, I can tell you that difficulties arising from high power-per-socket are very non linear: 200W isn't merely twice as hard to deal with versus 100W. It's easily an order of magnitude more difficult to cool with the same MTBF reliability. Luckily, Intel have realised this, at least in their 64-bit line. Once again, I am too ignorant of the 32-bit world to know the state of the art there.

    15. Re:what's the diff: dual core and hyperthreading? by ArbitraryConstant · · Score: 1

      "Sorry, I live in a 64-bit world, to the point that I'm quite ignorant of X86 state of the art. I've been blindly (and wrongly) assuming a 64-bit context for this whole conversation."

      Indeed. :)

      These chips will probably all have the x86-64 extensions, so they are 64-bit chips. Chips with these extensions have been on the market for over a year, first from AMD and soon from Intel.

      In the very high end market, very different tradeoffs are very clearly in effect. I'm not sure about die sizes, but there's that POWER 5 from IBM with 4 dual-core dies, each with IBM's version of hyperthreading and 144 megabytes (!) of cache. That might even be 130 nm, I forget.

      "Unfortunately, most of what I know about Itanium I really can't talk about, other than in very general terms -- assuming I wish to remain employed, that is. :)"

      Well, "Itanium is better a instruction level parallelism than x86." is not a statement I need you to confirm. Also... it's better than POWER.

      "The "sweet spot" for cache size is really determined by a race between core performance, and memory latency/bandwidth. Doubling cores doubles the data production/consumption rate. Doubling the frequency also does. The former is less demanding on memory latency and mostly requires more bandwidth. The latter is equally demanding on both. If you double the data production/consumption, and keep the same memory/bus bandwidth, ideally, you'd like to halve the cache miss rate -- but that's pretty unlikely in practice. That's why caches keep growing (at least in the 64-bit world). There is a point of diminuishing returns, because there's an upper limit to both temporal and spacial locality, but we're not quite there yet for Itanium."

      This is true... but from watching Intel for the last few years, they have had nothing but trouble getting any gains past the 3.2 ghz chips. They were stuck there for a long time, and they've only got it up to 3.6 ghz now. AMD has been doing well, but when their clock speeds get much faster, they might well begin hitting the same problems. At that point, adding a core is probably the easiest way to increase data production/consumption at all, even if memory bandwidth/latency becomes more of a bottleneck.

      "When more CPUs start to have integrated memory controllers and point-to-point links instead of multi-drop busses, I predict that cache sizes per core will actually decline for a while, since the memory performance side of the balance will lower the "sweet spot". After that, caches will probably creep slowly upwards again, because no memory or interconnect technology ever scales fast enough to keep up with CPU core performance scaling."

      Your statements make more sense now, if you've not been keeping up with x86. The AMD Athlon 64s and Opterons all have intergrated memory controllers, and in mutli-CPU systems, they use CPU-to-CPU links. They scale up to 8 CPUs with up to several TB of memory...

      "So the $64 question is... When cache sizes per core start to decline or level off, will we see smaller die, or will we see even more cores per die? The way Intel seems to always position Itanium for high-end heavy metal, I expect huge die with more cores, although for X86 they went the other way. Or so I infer from your posting."

      Probably both, some for low end value computers, and the dual cores for high end machines. Or middle end machines by your standards.

      "Not sure which CPU you're referring to here. Is it something in the X86 line? If you're taking a single core power and multiplying by two, then you may be very pleasantly surprised. Montecito, despite having two cores, will have significantly lower total power than its single-core predecessors. (That much I can safely reveal, because it seems to be common knowledge already.)

      Having designed systems, I can tell you that difficulties arising from high power-per-socket are very non linear: 200W isn't merely twice as hard to deal wi

      --
      I rarely criticize things I don't care about.
    16. Re:what's the diff: dual core and hyperthreading? by RockyMountain · · Score: 1

      In either case, I hope you're right about efficiency gains in a dual core chip. I want a computer I can turn on in the summer.

      Just to clarify... I was talking about Montecito. So, (1) You won't get one this summer, (2) you can't afford one on your desktop anyway. Just because Montecito will be low power does not imply that the chip in your PC will be. Not yet anyway.

      And, I didn't mean to imply that the efficiency gains have anything to do with the dual-core architecture. Not so. It took heroic effort and some amazing innovation to make Montecito such a low power chip. Eventually, other CPUs will _have_ to follow suit, because we are at or beyond the reasonable limit for per-socket supply delivery and cooling.

  34. There already is one. by Medievalist · · Score: 2, Informative


    VMS went 64-bit at least a decade ago.

    Great OS for English-speaking folk, despite Linus's hatred for it.

  35. Dual Core==uglier heatsinks? by lateralus_1024 · · Score: 1

    I cant wait for Zalman to make a gigantic copper spread for this generation of CPU's. This could be the end of the acrylic window cases, since the only thing visible will be copper fins and pipes.

    --
    If you think /. comments are bad, check out Digg.
    1. Re:Dual Core==uglier heatsinks? by TwistedSpring · · Score: 1

      Eventually Zalman will release the HeatCase/FlowerCase: A gigantic flowercooler heatsync that also functions as a regular computer case. Just insert your motherboard and cards and pump in a gallon of heat transmissive grease. We really need some better way of cooling CPUs, and if anyone mentions peltier you will get what's coming to you.

    2. Re:Dual Core==uglier heatsinks? by mt+v2.7 · · Score: 1

      It's expected to be the same size as current chips.

  36. Re:How is this different from a two processor syst by Anonymous Coward · · Score: 0

    But will probably require a heat sink that is twice as large - or perhaps just use the entire side of the case.

  37. FYI -- Gamers by Anonymous Coward · · Score: 0

    You know, if AMD would actually give money to game developers to multi-thread Unreal Tournament 2005, Doom 4, and Jedi Knight 4, they would see their sales climb through the roof. Gamers would be buying dual-core systems (and Opteron/MP/modded-XP), and loading up with Geforce6800's and X800s.

    A couple million to the Grand Theft Auto: San Andreas team, and maybe a little blurb in the readme.txt about how well the game works on multi-processor AMD systems, and people would take notice.

    Just a little FYI...

    1. Re:FYI -- Gamers by Anonymous Coward · · Score: 0

      No they wouldn't. Hardcore gamers are like 1% of the market at best. Its not even worth worrying about them.

    2. Re:FYI -- Gamers by Anonymous Coward · · Score: 0

      Yea, I have a dual HT Xeon, so 4 "virtual" CPUs and I hate it when you play a game, especially an RTS like Age of Mythology and it only uses 1 CPU. :[ What a perfect scenario for multi-threading (same with any game really), with multiple little AI's running around. Yet its pretty uncommon, most every game rams it all into 1 thread.

  38. The Matrix started this way . . . by MexicanMenace · · Score: 1

    1 slot = 2 CPUs
    2 slots = 4 CPUs
    4 slots = 16 CPUs
    . . .

    Someone flips on a Beowulf cluster started this way and it's Game Over man:

    Neo@Beowulf>./Whoah

    *cluster flies off*

    1. Re:The Matrix started this way . . . by uncl_bob · · Score: 1

      ...very true, except 4 slots = 8 CPUs ;)

    2. Re:The Matrix started this way . . . by Echnin · · Score: 1

      Not with quadruple core CPUs...

      --
      Lalala
  39. Re: unified (single) is not always better by mepperpint · · Score: 3, Insightful

    It's not entirely true that single is better. It depends on what the system is used for. If both cores are accessing the same memory (likely the case in a multi-threaded webserver for instance), then they can benefit from sharing a cache and effectively doubling the cache size. However, if both cores are accessing different memory (almost any situation where different applications are running on the different cores), then sharing a cache could have devastating effects on performance. As each process running on each of the cores would be likely to be evicting the other processes cached memory, there would be a plethora of cache misses. In the worst case, this could effective make the system as slow as if there were no cache at all. In the average case there would likely be a significant performance hit. A better strategy than unified or seperate caches would be to have a read/write cache for each core and allow each core to read the other core's cache. This would allow the benefits of the shared cache in the case where both cores were accessing the same memory without having the major performance hit when each process is accessing different memory. Unfortunately the hardware for this would be even more complicated than for the unified or seperate cache techniques.

  40. Does MS delay in releasing a 64 bit OS hurt them? by kabloom · · Score: 1, Interesting

    Is it going to hurt Microsoft that they aren't releasing a real 64-bit operating system for another year and a half?

    I tend to see the possibility that people who buy 64-bit computers will try to take advantage of their capabilities by choosing 64-bit capable operating systems to run on them. Even Itanium users would have little or no interest in running a beta version of Windows when they could have a real, tested, released operating system like Linux for cheaper. This may potentially even help Linux with hardware support, and encourage 64bit vendors to use more capable hardware.

    On the other hand the 80386 processor (the first to have 32-bit capabilities) was released in 1985. It wasn't until 1992, 7 years later, that Microsoft came along with an OS (Windows NT) that took advantage of the 80386's 32-bit architecture. It was 3 more years after that before Windows 95 brought those features to the consumer market, which Microsoft promptly dominated.

    Will Microsoft's delay in releasing a 64bit operating system hurt them? Will it make a difference?

  41. Re:How is this different from a two processor syst by hawkbug · · Score: 1

    nope, read the spec sheets on the projected AMD cpus - 90-100 watts maximum, and that means their high end stuff. The slower chips will be around 80 I bet. You're forgetting that these chips will be on the 90nm process at first, and then as speeds ramp, you'll see them on the 65nm process by 2006. That simply means no more heat than a single P4 puts out right now. AMD chips are currently very cool compared to Intels. Ofcourse, the clock speed has a lot to do with that. SOI is the other part.

  42. Taking the ram aproach by astrotek · · Score: 1

    2 - Dual Double Rate Dynamic Processors

  43. The G5 uses hypertransport... by Ayanami+Rei · · Score: 4, Interesting

    hence the block of RAM per CPU.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:The G5 uses hypertransport... by drinkypoo · · Score: 1

      Learn something new every day, I guess. Anyway it's ridiculous to cite that as a strength of the G5 that a descendant of x86 doesn't have, since not only does the Hammer processor have it, but the G5 and Hammer use the same bus technology.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:The G5 uses hypertransport... by shawnce · · Score: 4, Informative

      Actually they don't use the same bus technology.

      The G5 (PPC970/970FX) has a two 32 bit wide buses one going in each direction from the CPU and they have a data rate at half that of the CPUs clock rate. At a clock rate of 2.5GHz the bus is capable of a max theoretical throughput of 5GB/s each direction or 10GB/s in total (that is per CPU). Real world throughput is around 8 GB/s per CPU at 2.5GHz because of address/command overhead. Apple/IBM terms this the elastic bus and it is not HT based.

      For more information see this block diagram referenced from this hardware tech note.

      Anyway the the post you are replying to is incorrect about each CPU having its own RAM. That is not true. Each CPU has it own independent bus to the memory controller (U3/U3H) and that controller has a dual channel connection memory capable of 6.4GB/s a second (DIMMS are required to be added in pairs to allow for a 128 bit wide path to memory). The U3 chip is basically cross bar like internally allowing for a few point-to-point connections to be taking place between its various interfaces (CPU to CPU, AGP to memory, etc.).

      HT is used for as a secondary interconnect to relatively lower bandwidth devices in the IO chain.

    3. Re:The G5 uses hypertransport... by shawnce · · Score: 1

      This is incorrect, see my other post.

  44. No I'm not an idiot by TwistedSpring · · Score: 1

    Yeah I was right. The only reason I might be an idiot is because I didn't read comments posted above mine.

  45. Day late, dollar short... by gillbates · · Score: 3, Insightful

    While dual cores on a chip might be nice, it won't produce any serious performance increases.

    The underlying problem with Intel and AMD's processors is that they are at the mercy of the architecture:

    1. These chips must share a relatively slow memory bus with other devices.
    2. Currently, the fastest FSB to date is 1033MHz - almost 1/3 of the max clock speed of the processor. Given that Intel's integer units operate at twice the clock speed, the fastest parts of the chip operate at 6 times faster than memory.
    3. The monolithic, synchrous, central-processing-unit design of the architecture prohibits optimizations such as using memory controllers for block moves and having dedicated IO processors. Contrast this with Mainframes in which the CPU passes off IO instructions to ancillary processors and continues to work. In PC-land, when the IDE controller seizes the bus for a transfer from disk into memory, the CPU has to execute out of its cache for ~256 instruction cycles, or risk stalling.

    The ironic thing is that even though AMD and Intel are out-clocking mainframe processors by factors of 2 and 3, mainframes still get more work done simply because they aren't choked by a slow and overcrowded system bus .

    --
    The society for a thought-free internet welcomes you.
    1. Re:Day late, dollar short... by owlstead · · Score: 3, Interesting

      True, and someday every IO process will probably be handled by a dedicated processor. A distributed operating system will run processes on each, making it easy to reprogram the tasks. Fast interconnects will make a NUMA architecture possible.

      Currently however that future is far off. It's simply much cheaper to centralize processing, so the bus will remain an issue for some time to come. For most situations this will be fine, for specialized situations where a single (fast) real time process is needed, or when IO is more important than CPU power...it sucks.

      (listening to my integrated audio which takes about 7% of my processor, and I don't care a bit)

    2. Re:Day late, dollar short... by Anonymous Coward · · Score: 0

      maybe Intel uses an overcrowded system bus for communicating, but AMD does not. also their bus, HyperTransport, was introduced at 1.6ghz and now is up to 2ghz, and Intels 1066mhz bus is only announcned, not availible.

      your other points dont even make sense. slow communication with the rest of the system is a limitation of the PCI bus, which now will finally be alleviated with PCI express (not to mention PCI-x or 64bit cars that have been around for awhile). why would the ALU care that its running several times faster than the bus? it doesnt communicate directly with anything outside of the cpu. finally a bus isn't all about speed, its about width too. Whats faster a 1ghz 64bit bus or a 2ghz 32bit bus?

    3. Re:Day late, dollar short... by Anonymous Coward · · Score: 0

      Regarding #3, unless the system has dual ported memory there is no way around this problem. The processor and IO controller cannot both access system RAM at the same time regardless of the bandwidth.

      PCI Express will solve most of the bandwidth issues since it is not a bus but a full crossbar switch fabric.

    4. Re:Day late, dollar short... by kscguru · · Score: 4, Informative
      These chips must share a relatively slow memory bus with other devices.

      No... on AMD chips the memory bus is dedicated. Intel chips have a very different system architecture (which does saturate at ~2 CPUs), but AMD gives each chip its own memory controller and memory - scales perfectly. (By the way, this isn't new ... big iron (e.g. Sparc) has been doing this for years).

      Currently, the fastest FSB to date is 1033MHz - almost 1/3 of the max clock speed of the processor. Given that Intel's integer units operate at twice the clock speed, the fastest parts of the chip operate at 6 times faster than memory.

      That's why modern processors use pipelining (in x86, since 486's) and caches (since, uh, 8086s ?). FSB only comes into play in 1-2% of the memory accesses. But those memory accesses are pipelined, interleaved, with multiple outstanding requests issued by the out-of-order pipeline ... processor designers have been working around a slow bus for years, and the FSB is only the bottleneck in extreme, pathological cases.

      The monolithic, synchrous, central-processing-unit design of the architecture prohibits optimizations such as using memory controllers for block moves and having dedicated IO processors

      Ever heard of DMA? A DMA controller does that memory transfer ... there are 2 DMA controllers with 8 channels on your current x86 PC. Heck, high-end PCI cards even have their own onboard DMA engines (it's called bus-mastering). I/O offload? You've obviously never written a device driver... modern drivers issue a few "start" instructions, then sleep; eventually the device completes the I/O and issues an interrupt to inform the CPU it's done. The last computer I had that stalled on disk I/O was running MS-DOS - nine years ago.

      In all fairness, I thought exactly the same things four years ago. Then I learned about modern computer architecture. And in today's world (and, in fact, all PCs for the past ten years), your points are completely - and utterly - irrelevant.

      --

      A witty [sig] proves nothing. --Voltaire

    5. Re:Day late, dollar short... by stevelinton · · Score: 1

      The killer issue on cheap (ie PC) architectures is that the DMA controllers must compete with the CPU for memory access bandwidth. So, if a disk controller is DMAing a chunk of data into main memory and the CPU needs some instructions or data, they fight.

      Mainframe architectures have multiple banks of memory on independent busses and some very expensive cross-bar switches connecting them, so that the DMA controller and the CPU will only fight if they are unlucky enough to need data in the same memory bank. Some mainframes also use expensive dual-port memory in large quantities, so that even this is not a problem.

      Steve

    6. Re:Day late, dollar short... by gillbates · · Score: 1

      You've obviously never written a device driver... modern drivers issue a few "start" instructions, then sleep; eventually the device completes the I/O and issues an interrupt to inform the CPU it's done. The last computer I had that stalled on disk I/O was running MS-DOS - nine years ago.

      So you're obviously not using x86 architecture anymore...

      It doesn't matter whether you're running Linux, Win2k, XP, or whatever on x86 hardware. The problem is that the disk controller (whether through the IDE chip or via DMA chip) and CPU cannot access RAM simultaneously. What happens is that the disk driver issues the instruction, sleeps, and then the disk locks the CPU out of the memory bus when it transfers data from its internal buffer into RAM.

      I've noticed that regardless of OS, an x86 system becomes less responsive when copying a large directory tree. Typically, this is what happens during a file copy operation:

      1. Process P1 issues a disk read request.
      2. The IO driver isssues a request for a given group of sectors from a file. Then it sleeps.
      3. The CPU continues with another task, say P2, - but -
      4. When the drive completes the IO operation, it locks the bus as it transfers data to RAM. The CPU is effectively stalled at this point - it can only execute instructions and data out of its local cache. If even one instruction references memory not in cache, the whole instruction stream is suspended until the bus becomes available again.
      5. When the bus is relinquished, an interrupt is issued. The driver is woken from sleep, and the OS restarts process P1 - which requested the read. This process then proceeds to request a write operation to another group of sectors.
      6. The disk driver is invoked; it issues a write instruction, and sleeps.
      7. The OS swaps out the process P1 (waiting on IO), and restarts P2.
      8. The drive controller locks the bus as it reads the memory buffer. Process P2 is effectively stalled.
      9. The drive controller relinquishes the bus. Process P2 continues, and then -
      10. The drive issues an interrupt - IO complete, which restarts the driver. The driver reports IO complete to the OS, which restarts P1. Notice that IO isn't necessarily complete - but rather, the buffer has been copied into the drive's internal cache and is still being physically written to disk.
      11. P1 is restart, and requests another read, which effectively sends us back to 1.)

      This is admittedly better than stalling the CPU during any IO operation, but it still allows an IO intensive process to lock out the CPU.

      Because the stall is caused to the DMA or IDE controller, assigning a lower priority to P1, or even the disk driver, will have only a negligible effect. Because P1 is constantly stalled waiting on disk IO, its real priority class, in terms of timeslice, is far below that of process p2. For example:

      1. Suppose P1 has a priority of 1/100 that of p2. Supposedly, p2 would get 100 times the CPU timeslice that P1 would have.
      2. Suppose that P1 issues a disk read instruction for 32 sectors (16kB). Suppose it takes 4 instruction cycles to issue this request.
      3. The disk driver issues the read instruction.
      4. Process P2 is started by the OS.
      5. The drive controller locks the bus for 4096 instruction cycles as it reads the buffer into RAM.
      6. An interrupt is issued which restarts the disk driver. The disk driver reports IO complete to the kernel.
      7. The kernel sees that P2 is executing, while P1 is waiting on an IO which has just completed. P2 has been executing for greater than 4000 instruction cycles, where P1 has executed only 4 instruction cycles. Thus, P2 has used up its timeslice, and P1 is owed another 400 instruction cycles before P2 is restarted. Notice that P2's timeslice is completed based not on the number of instructions it actually executed, but rather on the amount of time it controlled the CPU. Thus, the IO process initiated by P1 is effec
      --
      The society for a thought-free internet welcomes you.
    7. Re:Day late, dollar short... by kscguru · · Score: 1
      Okay - I didn't realize that the CPU and the DMA engine can't access memory simultaneously. HOWEVER... that still doesn't make I/O ruin system performance.

      First ... any random disk read or write access is going to have millisecond-range latency - period. That's as fast as disks operate. (And unless you're exceptionally lucky or exceptionally clever, caches won't hide much of that latency). So when your P1 issues those commands to read some sectors from the disk, it will block for, in the best possible cases, 5ms. That's enough time for five million instructions to execute in process two. Memory busses run at ten-nanosecond latencies - there's enough time for several hundred thousand memory bus transactions. Or, tens of thousands of DMA transfers (transferring kilobytes instead of cache lines takes slightly longer). The DMA's control of the bus is rather long, yes - 4000 CPU cycles is a pretty reasonable number (which translates to no more than a microsecond or so). But compared to the time the disk access takes, the I/O-induced memory transfers' impact on the bus is completely negligible.

      Disk I/Os are actually a very bad example - let me give you a better one. Gigabit ethernet - yes, a modern x86 processor can saturate one of those. (Actually, it can saturate about two or three of those running a stock OS that just does routing). Good ethernet cards - which gigabit cards are - use DMA for transfers. The memory bandwidth of the same system is only 3.2 GB/s (for DDR400), call that 20-some gigabit, of which fully 2 gigabit must be DMA-controlled packet transfers. Plus, router software is looking at a lot of data - and generating a lot of cache misses. But the DMA access doesn't seem to hurt the system very much, now does it?

      My point? A single process can stall on disk I/O. In the presence of a bad disk scheduler, even multiple processes can stall on disk I/O - that's thrashing. But the problems are OS-dependent, and are not related to the architecture. Dedicated I/O processors only help I/O performance in extreme cases (like protocol offload engines) because I/O performance is primarily dominated by the hardware - the disk latencies, or network latencies - and not by the system architecture.

      I/O coprocessors only make sense in two cases. First, in very high-performance workloads which are optimized so deeply that any CPU time spent working on I/O is a direct hinderance to performance. That doesn't apply to 99% of the market. Second, in cases where the work done represents such a significant fraction of the workload that having a hardware assist is valuable - for example, hardware MPEG decoders or hardware SSL accelerators or onboard hardware IP/TCP engines. Again, doesn't apply to 99% of the market.

      Your process scheduling models have two deep flaws. In the first model: P1 is not restarted as soon as data is available. The OS will only preempt P2 if P2 is a distinctly lower priority than P1, in which case it's perfectly reasonable that P2 suffers - it has the lower priority. In the second model: the accounting is too fine-grained. Time shares are not allocated at the granularity of individual cycles, or even thousands of cycles. Instead, they are allocated at millions of cycles (1ms per share is extremely fine-grained on a modern OS - at least, outside the real-time world). The 4000 cycles P2 lost during the I/O on P1's behalf are negligible beside the 4 million cycles P2 is allocated - of which probably 5% (200,000 cycles) are probably lost to cache misses or TLB misses or other stalls anyways.

      --

      A witty [sig] proves nothing. --Voltaire

  46. Yield question by Michael+Woodhams · · Score: 2, Interesting

    Are the dual cores on the same piece of silicon? This would require both cores to be defect free. If only one core is defect free, is it possible to disable the dud and sell it as a single core CPU? This would make it a much more attractive proposition for the manufacturers.

    E.g. if a single core has a yeild (probability of being defect free) of 80%, then the dual core chips will have a yeild of 0.8^2 = 64%. (Actually slightly lower, because whatever interconnect they have also has to be free of defects.) 64% will have two good cores, 4% will have two bad cores, the remaining 32% will have one good core. The manufacturer would obviously like to make use of that 32% if they can.

    --
    Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
    1. Re:Yield question by mercuryresearch · · Score: 3, Informative

      The manufacturers have the choice of using multichip module packaging (common in notebook graphics controllers, for example) or single die, however it is my current understanding we're talking a single die.

      They very likely WILL disable the dud and sell them as single core CPUs. This is how the "value" brands (Celeron, ex-Duron, and now Sempron) are typically created -- when there's a defect in the processor cache (which is a very large area of the die, and thus more likely to have a defect), the faulty bank(s) are turned off via fusing, creating a CPU with a smaller cache.

      This is all pretty standard yield management.

      Also, your calculations are very close to being correct, while the manufacturers closely guard their yield information, you're in the ballpark -- and it's interesting to note according to my estimates Intel's Celeron volumes approximately mirror your computed single-core yield percentage... meaning it will likely be business as usual in our dual core future.

      BTW, if you're interested in computing yield values there's an excellent model to be had in one fo the chapters in Henessy and Paterson's _Computer Architecture, a Quantitative Approach_

    2. Re:Yield question by RockyMountain · · Score: 2, Funny

      If only one core is defect free, is it possible to disable the dud and sell it as a single core CPU?

      Yes, it is possible, in most cases. (Although there are a few types of defects that would prohibit this, such as power shorts).

      For example, hypothetically, Intel could sell a single core version of Montecito called the Half Monte and a dual core version called the Full Monte.

  47. OMG, wish I had mod points by Anonymous Coward · · Score: 0

    For goodness sake, stop it!! Slashdot posts are not meant to take wit and humor above the hot grits baseline! :-)

  48. Bus architectures are the key by Shapemaker · · Score: 2, Insightful

    From what I know of the current architectures, AMD's solution to main memory access woes (point-to-point bus) seems more sane as soon as more than a couple of processors are installed in the system. Shared bus (as in Intel's solution) seems to require huge caches to operate efficiently, and as we all know, Pentium 4 really does not like pipeline stalls or branch mispredictions.

    Let's take a hypothetical example: quad processor systems utilising dual core processors from Intel and AMD.

    AMD: each processor (core) talks directly to its local memory block, and via HT links to adjacent processors' memories. Processors do not have to contest for access to the bus and thus memory access is always low-latency, even when accessing remote memory. If built today, HT links would operate at 1 GHz.

    Intel: processors share the same bus with each other and memory controller. Any time a processor needs to access memory, it has to wait until the bus frees to ask the memory controller access to main memory. Pipeline stalls happen here if bus is not free when needed. This is compensated with huge L3 caches. As far as I know, current quad processor systems from Intel have bus speeds of around 533 MHz.

    So in a nutshell, Intel competes with AMD on a quite level field when the system has 1-2 processors, but as soon as processor count goes up, bus bandwidth becomes an issue with Intel. It shall be interesting to see how Intel attempts to counter that.

    What I am getting at with this? Well, those huge 12 MB L3 caches in Intel's future processors sure aren't cheap. They take up lots of silicon and WILL decrease core yields since they've got lots and lots of points of failure. So manufacturing processes really have to be ramped up to allow that at reasonable cost.

    --
    "Intellectual Property" should be an affront to anyone capable of independent thought.
    1. Re:Bus architectures are the key by Anonymous Coward · · Score: 0

      Yield problems on cache areas are very rare.

  49. Wow! How on earth... by callipygian-showsyst · · Score: 1

    ...will Apple's G5 every catch up with the tricks Intel and AMD have up their sleeves? Just when Apple got to be nearly as fast as Intel and a little behind AMD, this will leave them out in the cold.

    1. Re:Wow! How on earth... by Anonymous Coward · · Score: 0

      PowerPC is pased on IBM's POWER cpu architecture which has had dual core for many years. I don't think this is an issue.

    2. Re:Wow! How on earth... by LemonYellow · · Score: 1

      Most likely, using one or the other of these. I don't think they'll be worrying about being left behind just yet.

  50. 64-bit by mr_burns · · Score: 2, Interesting

    It's not a question of if there will be 64-bit OS's to go with these things. Eventually, it's sure to happen in multiple flavors.

    The real question is what ELSE will be on the motherboards and in the chip by the time these things hit the market? Specifically, what DRM hardware will come with these things? What will the BIOS look like?

    That's why I think that the current generation of 64-bit desktops are probably one of the best values for a machine you might be using 4 years from now. It's risky to wait 6 months or a year with the current views of the US Congress and FCC. This generation of 64-bit machines might be one of the last to be multi-purpose Turing/Von Neumann devices.

    Don't wait for dual-cores if you have the cash and want to be the one in control of your 64-bit machine. Eventually the OS's will catch up.

    --
    "Let him go, Ralph. He knows what he's doing." --Otto Mann (simpsons)
    1. Re:64-bit by algae · · Score: 1
      This generation of 64-bit machines might be one of the last to be multi-purpose Turing/Von Neumann devices.

      I know that you're referring to the general-purpose computer von Neumann machine, but most people are more familiar with the universal constructor (a universally programmable machine that's also able to create perfect copies of itself, and transfer its program to them).

      While I wish that my A64 box could do that, I really don't see it happening anytime soon :).

      --
      Causation can cause correlation
  51. The memory controller is on the chip by cybrthng · · Score: 1

    last time i checked the opteron lines had the memory controller on the chip hence the lightning fast fsb support at almost native clockrates..

    1. Re:The memory controller is on the chip by Aadain2001 · · Score: 1

      But that's for talking to OFF CHIP memory, not on chip memory such as cache. Guess you don't understand that concept.

      --
      Space for rent, inquire within
  52. Re: unified (single) is not always better by Anonymous Coward · · Score: 0

    In the worst case, this could effective make the system as slow as if there were no cache at all.

    Or worse... this can be realized on a single CPU system even. Just have a number of variables that all map to the same cache tag but the number of variables are many times the X-way associativity of the cache. Each time an access happens, one of the other lines will have to be flushed. If they are only read, it isn't a big deal, but if there's a write each time, then it can be quite painful.

    The easy way to do this is to have a number of large arrays that are powers of two in size and each array is as large as the L2 cache. The number of arrays you use should be a multiple of the X-way of the cache (for an 8-way set associative cache, choose 16 arrays for example). Now, do calculations based on entries in each array.

    for (int i = 0; i ARRAY_SIZE; i++)
    Array1[i] = Array1[i] + Array2[i] + ...;

    Absolute cache destruction.

  53. HT Processor Interconnect by Anonymous Coward · · Score: 1, Interesting

    Would it be possible at such small levels to make the processor interconnect very very fast and very low latancy? I'm sure HT (Hyper Transport) is designed as a very robust error checking protocol, surely it must be possible on this scale to get 3ghz or more out of what is meant to be a Chip to Chip interconnect. Also, what are the problems with allowing either processor to access memory as and when it is needed? I.E. Wiring the memory controller of both processors to the pins and making a realtime decision based upon which core has access to memory? This would reduce latency a lot for the second core.

    1. Re:HT Processor Interconnect by Anonymous Coward · · Score: 0

      Or even providing a GDDR3 memory controller on one processor and a DDR1 controller on the other?

  54. I will translate. by Anonymous Coward · · Score: 0

    Let's walk through the sentence slowly and try to figure out what the Idiot was trying to say.
    "I bought my girlfriend has her own dual cache for her dual-core chips last christmas!"
    First of all, I think he meant to say "I bought my girlfriend her own dual cache for her dual-core chips last christmas!"
    Ok. Now, first we have to identify what "her dual-core chips" are. In my years of Idiot translating experience, I am confident that he means "her boobs". That leaves us with "dual cache". Since a cache is used to temporarily store something, and we are talking about boobs, I think "dual cache" is "bra". So, we are left with "I bought my girlfriend her own bra for her boobs last christmas!"
    However, we have double pluralities here. "chips" implies more than one chip, and "dual" implies two. Therefore, I think we have to modify our translation. Since "dual boobs" doesn't make sense, I think that "chip" really means "nipple". So, accounting for this in our new translation, we are left with "I bought my girlfriend her own bra for her two-nippled boobs last christmas!"
    Or, in more proper English, "Last Christmas I bought a bra for my girlfriend, who has two nipples on each breast."

  55. Re:Does MS delay in releasing a 64 bit OS hurt the by wed128 · · Score: 1
    I tend to see the possibility that people who buy 64-bit computers will try to take advantage of their capabilities by choosing 64-bit capable operating systems to run on them.
    I kinda disagree. I'll bet a lot of people who are buying 64 bit systems are running vanilla windows XP, aren't really doing any heavy processing, except what's offloaded to their GeForce FX 42Bazillion and are thinking... "If it worked for nintendo..." or "wow 64 bits is bigger than 32, i'll be 1337 fur shur!!"
  56. Re: unified (single) is not always better by mepperpint · · Score: 0

    You are entirely correct that you can craft code to negate the usefulness of the cache on any system. For the average application, however, the cache is extremely effective and this special case does not arise. If it were to arise accidently, it would not be difficult to rewrite the code in a cache-friendly manner.

    The interesting difference between the single core and the multiple core sharing a unified cache is that two memory intensive cache-friendly programs could trample each other's cache and result in "Absolute cache destruction". The result could be that the two programs running on seperate cores and sharing a cache would run slowing than the programs taking turns on a single core as every instruction on the single core might be a cache hit(data is in the cache), but when moved to the dual-core with unified cache could become a cache miss(data is not in the cache). This would me a huge performance hit because memory access is orders of magnitude slower than accessing the cache (resulting in worse performance on the dual-core system).

  57. Dual core AMD for Socket 939? by Anonymous Coward · · Score: 0

    What do you think the chances are that these dual core chips will be made available for s939 motherboards? :/ I'm definitely upgrading at the end of this year (939, AMD64, DDR2, yadda yadda) no matter what, but am dying to get my hands on this without having to upgrade to an entirely new platform (939 will only be ~1 year old when this arrives)!

    1. Re:Dual core AMD for Socket 939? by Wesley+Felter · · Score: 1

      100%; AMD already announced it.

  58. Will they not jack us around with $$$ SMP CPUs? by swb · · Score: 1

    Why can't they just make run of the mill CPUs at least dual-processor capable, and not force us to buy their upmarket SMP CPUs?

    Was there ever a real technical reason behind this, or was it purely marketing? PII/PIII could go dual, but now it's Xeon land and way more expensive motherboards to boot. I don't remember dual P2/P3 motherboards being a whole lot more expensive than their single CPU counterparts.

    1. Re:Will they not jack us around with $$$ SMP CPUs? by kscguru · · Score: 1
      Because SMP requires a LOT of synchronization logic. Cache coherency most importantly, but also system support (usually through the memory controller) for the synchronization primitives used to implement locking. Locking on a uniprocessor is trivial (disable interrupts); locking on SMP is difficult (test-and-set, load-locked / store-conditional, or something more modern). Add in the logic to verify that the synchronization parts are correct...

      Building a fast CPU core is easy. Building a fast CPU core that can talk to other CPU cores with a tolerable latency (hint: CPU-to-CPU communication needs to be faster than CPU-to-memory) is hard.

      --

      A witty [sig] proves nothing. --Voltaire

    2. Re:Will they not jack us around with $$$ SMP CPUs? by Anonymous Coward · · Score: 0

      Athlon64 shares the Hammer core which has built-in SMP support. It would be very easy for AMD to start selling a SMP or sual-core Athlon for the high-end consumer market. No technical problem.

      And last I checked, Athlons were similarly priced with their equivalent Opterons, so the reason isn't to protect Opteron sales.

      Mass-market dual core would be difficult with AMD's limited production facilities, but I suspect the real reason is that AMD does not believe that there is a significant market for consumer SMP at this time.

    3. Re:Will they not jack us around with $$$ SMP CPUs? by Paladin128 · · Score: 1

      But the fact of the matter is, at least in AMD's case, the high-end Opteron preceded the Athlon64. There's a 1-pin difference between the newer A64's and Opterons! I honestly don't believe that they have to cut enough transistors out of the silicon to justify 2X the price.

      The fact of the matter is, SMP=server/workstation market, so they can charge more. Most users won't gain much (if any) real world from an SMP system.

      --
      Lex orandi, lex credendi.
  59. My misread... by Transcendent · · Score: 1

    Dual Caches for Dual-core Chips

    Ahha... I misread that as "Dual Crashes..." thinking Windows found a way of really boning up a system when running on dual processors/cores.

  60. You're missing one little tidbit... by Svartalf · · Score: 1

    It's not just 64 bits that we're talking about here- it's a larger register pool with the AMD64 architechture. This WILL affect many things end-users do.

    (Not to mention that an Athlon64 in 32-bit mode seems to stack up rather nicely against the P4 clocked half again faster... Go figure... :-)

    --
    I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
    1. Re:You're missing one little tidbit... by DarkEdgeX · · Score: 1

      Yeah, but to access those new registers you need to use the new REX opcode which increases code size (reducing the amount that can be kept in cache). And I haven't seen any benchmarks, but I imagine anything that uses the REX opcodes is probably slower than just using the original registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP).

      I'd love to see some benchmarks/timings for AMD64 instructions (especially a comparison between using the original 8 registers vs. the 8 new registers).

      --
      All I know about Bush is I had a good job when Clinton was president.
  61. NUMA and process affinity? by Anonymous Coward · · Score: 0

    I thought the future was all about NUMA and process afinity (keeping processes "affine" to a specific CPU and memory pool).

    There are some consumer motherboards that are NUMA right now, with opterons...each CPU has it's own memory.

    Setting the process affinity keeps the scheduler from ping-ponging the processes across the CPUs.

  62. Processor by noidentity · · Score: 2, Funny

    ...will both have two processor cores, the actual unit inside a processor that performs the calculations...

    Oh, so that's what a processor does! Can you remind me again what "RAM" is?

  63. Okay, yes, yes. by Ayanami+Rei · · Score: 1

    This isn't the first time I made that assertion only to be corrected in this same vein.

    (hits self)

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  64. Re:Note: Single is Better unless you're clever by Anonymous Coward · · Score: 0

    Suggest you look more into AMD's implementation of separate caches. Assuming the two cores with independent caches work more or less as two separate K8 CPUs do now, the two parts actually can snoop each other's caches before reaching for main memory-- from what I've read, the net effect of having two 1MB caches on separate cores should approach the performance you would see from two 2MB cache CPUs without this capability.

    See http://chip-architect.com/news/2003_09_21_Detailed _Architecture_of_AMDs_64bit_Core.html#3.18 for more details than I'm qualified to rehash (I have a CS and computer architecture degree, but it's from 15 years ago and I'm not that active in the biz these days... I can follow the discussion, more or less).

    AMD has patents that make this possible using Hypertransport across CPUs, but I'm not sure how that relates to how they'd do it within a dual-core CPU, and I'm not sure that they'd rule out all the ways Intel could achieve a similar result.

  65. Re:Does MS delay in releasing a 64 bit OS hurt the by dickrichardv8 · · Score: 1

    I can only speak for myself. i bought a HP AMD64 3400+ computer at CompUSA with my hard earned $949 bucks and fired it up as soon as I could get it home. Was I disappointed, It didn't seem to be any faster than my 750mz pentium and i was swamped with popup messages urging me to buy this and that to make the menu buttons actually work or else warning me that my trial software would expire in xy days unless I bought the full version. Fdisk -all got rid of the salesmen. I installed beta WindowsXP 64 bit and could not get virus software or drivers for my scanner, sound card, etc. I put Fedora Core2 x86_64 on the second partition. Absolutely everything worked, even my 8in1 card reader. Now my box screams, I can compile a kernel faster than I can pop and eat a bag of microwave popcorn. Nevermind the fact that the 64bit arch. gives me enough highway to run the Gimp while the kernel is compiling on desktop1. Me wait on Microsoft? Nahhh.. I'm on 64bit now and ain't lookin' back and I have not bothered to dual boot XP just to see it slow down my 64bit box. Maybe Longhorn.....someday....showme like they say in Missouri.

  66. Re: unified (single) is not always better by farnz · · Score: 1
    But modern caches are set-associative, and LRU; hence, in the worst case where two cores are accessing different data with the same tags, they split each set evenly between them. Thus, a dual core with shared cache is never any slower than a dual core with the same amount of cache split in two.

    Practically, a dual core with shared cache will probably have less cache than a dual core with two caches, and thus will be slower, but there is no reason why the cache controller has to be designed in such a way that two memory intensive programs can trample each others' cache elements.

  67. Dual caches by Quiberon · · Score: 1

    You mean, like this guy ? BlueGene

  68. mmmm... not quite by Moraelin · · Score: 1

    I have programmed (both assembly and higher level) back in the 16 bit days (and in the 8 bit days for that matter), and the problem was entirely different.

    The big problem wasn't just having to address more than 64k of data, it was having to address _one_ _chunk_ of more than 64k. (E.g., a 640x480 pixel bitmap was already over the limit, even in 4 bit colour.)

    Having 50,000 chunks of 10-30k each wouldn't really even start to be a bother. You'd just load the segment register at the beginning

    What made one large chunk be special was that you had to do segment arithmetic in the middle of addressing its contents.

    E.g., if you wanted to apply a gauss blurr (or any other a matrix filter) to a bitmap, having to compute the segment and offset for each pixel was a huge performance hit. If you applied something as trivial as a 3x3 filter to a 640x480 bitmap, you'd end up doing segment arithmetic 9x640x480 = 2,764,800 times in the process, instead of just adding/subtracting a constant value to an int.

    Of course, you could and did optimize it better than that. (E.g., it's trivial to reduce that to computing the segment/offset only once per row. I.e., only 480 times.) But that was something you had to do. That was the kludge and extra work of those times.

    Frankly, I don't think we have the same problem nowadays. To have the same problem with the 4 GB limit of 32 bit addressing, you'd need one 4 GB chunk of allocated memory, which you can't possibly break into smaller chunks.

    E.g., if you process a DVD movie (as per your example), does it really need to be a single chunk? Well, no. It's divided into frames, which are a lot smaller than that. You also don't have to hold the whole movie in RAM, and inddeed you don't, since few people have 4 GB RAM on their computers even at work.

    Even on the server side, I'd wager a guess that 99% of server side stuff doesn't allocate over 4 GB to a single process. (None of our application servers do, and we're talking a rather big corporation.) And even if someone did, I doubt they'd get 4 GB as a single malloc().

    So again, it's nowhere near being as big a problem as the old 16 bit problem.

    Don't get me wrong, I do have an Athlon 64 and they're nice chips anyway. The extra registers and high IPC are reasons enough to have one anyway.

    But the whole "we need 64 bit now!!!" is IMHO just marketting hype and bullshit. The majority of computers need 64 bit registers like fish needs a bycicle.

    --
    A polar bear is a cartesian bear after a coordinate transform.
  69. announced that they announced? by Cynikal · · Score: 1

    and if we still don't pay attention, will they announce again this announcement about their last announcement?

  70. One Other Thing by Bilbo · · Score: 1

    Another thing to remember is that, using 64bit in user-land can actually slow down some applications. You're pushing around twice as much data for every single operation. I have a 64bit ULTRASparc running Linux (Aurora). The OS is compiled in 64bit, but most of the applications are still running in 32bit mode. Many applications break when you compile them in 64bit mode. Others work, but slower. Don't think I've seen any actually run faster. Unfortunately, I don't have anywhere near 2Gig RAM on that box, so I can't really take advantage of the addressing space extensions.

    --
    Your Servant, B. Baggins
  71. Re: unified (single) is not always better by julesh · · Score: 1

    Do you have any references on this? It sounds counter-intuitive to me. I would have thought that as long as both cores were accessing constantly accessing memory, the cache would be effectively split between them in a roughly 50/50 split. Actually, if one process was using more memory than the other, the split might end up being proportional, I think -- and this ought to improve performance, as it effectively means that the cache size (which in total is presumably twice as large as the unshared caches would be) is split in an adaptive fashion between the two processes.

    Now, I'm not an expert on cache behaviour or memory access patterns, so I'll accept I could easily be wrong -- but I think the way I see at as the more likely scenario.

  72. nitpicking here. by flaming-opus · · Score: 1

    Actually the FSB is the bottleneck ALMOST ALL THE TIME. It may only be 1-2% of the instructions, but a ram-load takes hundreds or thousands of CPU cycles. That's the very reason for speculative loads on the itanium, to start the load as far in advance as you can. Modern processor architectures are built around trying to minimize the necessity of RAM-loads. This is, or course, a problem of latency and not of bandwidth, though you need that too.

    That said, Mainframes don't have any real sollution to the latency problem either. (except vector CPUs from NEC or Cray, but that only works for a very limited set of programs)

    In x86 land the 486 was also the first to have a cache. (8kb)

    1. Re:nitpicking here. by kscguru · · Score: 1
      In x86 land the 486 was also the first to have a cache. (8kb)

      I thought I had a 386SX that had cache - not onboard, but soldered onto the motherboard. Or maybe I'm thinking of another machine. Alas, I don't have the system or the specs anymore, so I can't be sure.

      re:FSB, I shouldn't have claimed pathological cases. The fact is, the x86 architecture is very well suited to scale up to multiple CPUs even in leiu of the FSB's limitations - and the grandparent seemed to imply otherwise. Now, a slow FSB has some interesting pathologies, but the general case performance, with good caching, is actually pretty good!

      I'm actually quite impressed at how much performance system architects have gotten out of modern processors even given the slow FSBs - with appropriate caching, prefetching, and upcoming multi-core designs (I'm actually looking to the 8-core Niagra as a better example), the FSB's speed is becoming less and less of a bottleneck.

      --

      A witty [sig] proves nothing. --Voltaire

  73. OBVIOUSLY! by rew · · Score: 1

    Obviously you want dual caches. Even if a single shared cache would be a benefit with a slightly higher cache hit rate, you have two 3GHz plus cores bashing at it at > 3 billion times a second. In that case you don't want to deal with the cores tying up each other for ALL cache-accesses that they do.

    It is certainly worth it, to push the point where memory hierachies meet as far as possible from the core. That's why you get big caches in the XEONs meant for multiprocessing.

    Now, if a dual (triple?) ported cache were "free", then of course you'd go for the dual ported version: double the cache size, and your hit ratio goes up a little. But they are not free. You pay in speed, area, transistor count etc. In the end, I'm pretty sure Intel and AMD will have evaluated the tradeoffs and taken the right decision.

    With a bit of luck, they have a "fastpath" between the caches for stuff that needs to bounce around between the two caches/cores.

    Oh. I haven't read the article. Sorry. :-(

  74. Girlfriend is Better! by Anarcho-Goth · · Score: 0, Offtopic

    I . . . Who took the money?
    Who took the money away?
    I . . . It's always showtime
    Here at the edge of the stage
    I, I, I, wake up and wonder
    What was the place, what was the name?
    We wanna wait, but here we go again...

    I . . . takes over slowly
    But doesn't last very long
    I . . . no need to worry
    Evr'ything's under control
    O - U - T But no hard feelings
    What do you know? Take you away
    We're being taken for a ride again
    I got a girlfriend that's better than that
    She has the smoke in her eyes
    She's moving up, going right through my house
    She's ginna give me surprise
    Better than this, know that It's right
    I think you can if you like
    I git a girlfriend with bows in her hair
    And nothing is better than that

    Down, down in the basement
    We hear the sound of machines
    I, I, I'm driving in circles
    Come to my senses sometimes
    Why, why, why, why start it over?
    Nothing was lost, everthing's free
    I don't care how impossible it seems

    Somebody calls you but you cannot hear
    Get closer to be far away
    Only one look Maybe that's all that it takes
    that's all that we need
    All that it takes, all that it takes
    All that it takes, all that it takes
    I got a girlfriend that's betther than that
    And she goes wherever she likes. (there she goes...)

    I got a girlfriend that's better than that
    Now everyone's getting involved
    She's moving up going right through my heart
    We might not ever get caught
    Going right through (try to stay cool) going through, staying cool
    I got a girlfriend that's better than that
    And nothing is better than you

    I got a girlfriend thats better that this
    And you don't remember at all
    As we get older and stop making sense
    You won't find her waiting long
    Stop making sense, stop making sense...stop making sense, making sense
    I got a girlfriend that's better than that
    And nothing is better that this
    ( is it? )

    --
    I hate Liberals and Conservatives.
    If you are a Liberal or a Conservative, then HAVE A NICE DAY!
    Courage.