Slashdot Mirror


ARM Unveils One-chip SMP Multiprocessor Core

An anonymous reader writes "ARM Ltd. will unveil a unique multi-processor core technology, capable of up to 4-way cache coherent symmetric multi-processing (SMP) running Linux, this week at the Embedded Processor Forum in San Jose, Calif.. The "synthesizable multiprocessor" core -- a first for ARM -- is the result of a partnership with NEC Electronics announced last October, and is based on ARM's ARMv6 architecture. ARM says its new "MPCore" multiprocessor core can be configured to contain between one and four processors delivering up to 2600 Dhrystone MIPS of aggregate performance, based on clock rates between 335 and 550 MHz."

34 of 145 comments (clear)

  1. ARM servers by MrIrwin · · Score: 4, Interesting
    I had thought of ARM processors being the future for client devices and embedded systems.

    Looks like here we are pointing at server technology.

    How long before we have a 64/32/16 bit vatiable word size Thumb like architecture?

    --

    And if you thought that was boring you obviously havn't read my Journal ;-)

    1. Re:ARM servers by swordboy · · Score: 3, Insightful

      I think the one thing that we're all waiting for is the introduction of on-chip system memory. Currently, the cache of a high-performance processor consumes more than half of the chip area because the penalty for a cache miss is so large. For decades now, memory frequency scaling has lagged that of the microprocessor. Although there has been some great strides recently, latency is still rearing its ugly head. External DRAM is too electrically distant to remain at the heart of any high-performance system.

      Once we get processor and memory combined, we'll see performance increasing by several orders of magnatude. Processor architecture will matter even less, since emulation of *any* architecture will become trivial in terms of available processing speed. Your Thumb-like prediction will most certainly pan out to some magnatude.

      --

      Life is the leading cause of death in America.
    2. Re:ARM servers by MathFox · · Score: 3, Insightful
      why don't we see more reasonable personal computers (or blades servers) based upon this architecture.
      I was an Acorn Archimedes user for more than 10 years (the workstation that the ARM was originally designed for) and they were great systems. Affordable, decent speed and good operating system.

      Alas, they were not "PC-compatible" and at a certain time the Intel/AMD clones with Linux became much more attractive.

      Somthing along the profile of the Psion Netbook or old (or new depending upon your perspective) Apple Newton (also ARM) would be very cool and useful.
      Are you talking Sharp Zaurus? I'm eyeing one (If I could order them in the Netherlands...)
      --
      extern warranty;
      main()
      {
      (void)warranty;
      }
    3. Re:ARM servers by Christopher+Thomas · · Score: 4, Informative

      For decades now, memory frequency scaling has lagged that of the microprocessor. Although there has been some great strides recently, latency is still rearing its ugly head. External DRAM is too electrically distant to remain at the heart of any high-performance system.

      Once we get processor and memory combined, we'll see performance increasing by several orders of magnatude.


      This idea has been around for what is almost certainly longer than either of us have been alive. It turns out that there are problems.

      The main problem is that no matter how much memory a system has, we find ways to use it. In the time I've been using computers, memory size has gone up four orders of _magnitude_, and I'm sure the greybeards listening will top that. The processor sitting in your machine right now has more on-die memory (the cache) than, say, an early XT had, but the tasks you're running have a memory footprint too large to fit. This is the price for being able to _do_ more than you could do on that old XT.

      Another problem is with the structure of memory itself. You've heard of "fast, cheap, good - pick two"? Memory is "large, fast, densely-packed - pick _one_". The reason why integrated logic/DRAM processes tend to do one or the other badly is that DRAM and logic have to optimize transistor characteristics for exactly opposite things (high "on" current for logic, low leakage current for DRAM). Among other things, this means that DRAM is either slow or very power-hungry. SRAM is bulky no matter what you do - it's the cost of playing, when you have six transistors instead of one. Any kind of large RAM array is slow no matter what you do - you have to propagate signals across a huge structure instead of a smaller one.

      The solution to date has been a hierarchical cache system, where small, fast, on-die memory is accessed whenever possible, and when that overflows, larger, moderately fast, on-die memory, and when that fails, DRAM. This works amazingly well, giving you almost all of the benefits of fully on-die memory for problems that fit in cache. Problems that don't fit in cache won't fit in on-die memory, so going with an on-die implementation doesn't help for them.

      Progress in improving memory response times is made in two ways. The first is to use a better cache indexing algorithm that is less suceptible to pathalogical situations. In the simpler indexing schemes, you can end up with situations where a short repeating access pattern can hammer on the same small set of cache blocks, causing cache misses even when there's plenty of space elsewhere. Higher associativity and tricks like victim caches reduce this problem. Techniques like a "preferred" block in a set reduce the time penalty for high associativity, and techniques like content-addressable memory reduce the power penalty. This is still a field of active research - build a better cache, and you get closer to a system that _acts_ as if it has all memory on-die.

      The second way of improving memory subsystem performance is to use memory speculation. This involves either figuring out (or even guessing) what memory locations are going to be needed and preemptively fetching their contents, or taking a guess at the value that will be returned by a memory fetch before the real result comes in. In both cases, you're masking most of the latency of the memory access, while paying a price for failed speculations (either in higher memory _bandwidth_ required, or in power for speculated threads that have to be squashed). Build a better address and data speculation engine, and you'll again approach performance of an impossible all-on-die-and-fast system.

      In summary, it turns out that putting all of the memory of a general-purpose system isn't practical now and won't be as long as requirements for memory keep increasing. However, caches already give you performance approaching this for problems tha are small enough to _fit_ in on-die memory, and cache technology is constantly being improved. This is where effort should be (and is) going.

    4. Re:ARM servers by drinkypoo · · Score: 3, Informative
      First try a google for Cobalt server ARM and then try another one for Cobalt server MIPS and see how you do. Cobalt Qube and Raq up to 2 were MIPS architecture machines, not ARM.

      ARM has been used in many PDAs as you say, and in Acorn/Archimedes computers. It's also in the Game Boy Advance (ARM7 I believe) and will likely be the foundation of the Dual Screen (ARM9 and ARM7 both will be in the box, if leaked specs can be believed.) Arm also begat StrongARM, and intel purchased (some level of) rights to the StrongARM II architecture, which they call XScale.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  2. Imagine a.... by System.out.println() · · Score: 3, Funny

    ..... .....

    What do you want, a cookie?

    Seriously though, this would be great to run Linux on... Like a new Zaurus perhaps :)

  3. Interesting by INeededALogin · · Score: 5, Interesting

    The MPCore multiprocessor enables system designers to view the core as a single "uniprocessor", simplifying development and reducing time-to-market, according to ARM.

    The opposite of HyperThreading? 4 CPU's to one instead of 1 CPU to 2?

    The only thing that I can guess they mean by simplifying is that a developer would not have to design a multi-threaded application to take advantage of the other threads.

    1. Re:Interesting by Tune · · Score: 4, Informative

      It appears to be similar to other dual core technologies except developers need to worry less about threads accessing the same data. This is accomplished by cache snooping, which is a dated, but very fast way to avoid (L0) cache inconsistencies. That should take care of a major hurdle wrt. keeping SMP threads busy, especially if the clock speeds are relatively low.

      Notice that SMP has been a dream to the ARM team from its early Acorn/Archimedes days on. It seems they finally got it working...

  4. Synthesizable = can put it in an FPGA by Anonymous Coward · · Score: 5, Interesting

    In case you were wondering what that is all about...

    Synthesis of a core is analagous to compiling your software- except in an FPGA it is processing a hardware definition language like VHDL or Verilog to create the 'code' used to load the FPGA.

    This is a big plus for people wanting to put a wicked fast processing unit in the core along with whatever custom IO goodies they can come up with.

    Too bad its not open source, as there are other wicked fast processor cores available. For example Xilinx can license you to put a PowerPC in its FPGA cores.

    1. Re:Synthesizable = can put it in an FPGA by NoMercy · · Score: 3, Informative

      I'm not sure how to tell you this, but youre virtually totally wrong one very point.

      Synthisiable to Silicon, for ASIC's mostly though people like Philips turn them into micro-controllers and Intel make a few Micro-processors, the idea mostly is you can put a LCD controller, SIM Card reader, DSP, etc all on one lump of silicon with an ARM processor and put it in your mobile phone.

      And you don't licence a PowerPC core to put in a FPGA, you get a PowerPC chip actually inside the FPGA (Vertex2 Pro), any IP-Cores you see in the core-gen are simply the hooks into these devices that are already there, similar to the GCM's.

      And the big plus of this... well I don't really know but depending on how much number crunching it can do, and how much heat it generates when it does it, it could see all manner of applications.

    2. Re:Synthesizable = can put it in an FPGA by eclectro · · Score: 4, Informative

      Too bad its not open source, as there are other wicked fast processor cores available. For example Xilinx can license you to put a PowerPC in its FPGA cores.

      There is this.

      You can find the code easily. There are a couple of other clones, but I have not heard much about them. Another one is BlackARM developed in Sweden a couple of years ago.

      I think these projects would be ok as long as they are instruction compatible, but not an internal clone. In which case ARM would pull out their lawyer dogs.

      But there are a couple of other open source cores available, which IMHO would be smarter to use because you could do more with them without the fear of legal reprisal from ARM.

      If you are designing an embedded system, you might could get by using such a core. The thing ARM has going for it is that commercial support and toolkits are available, which can be handy if you have a complex application that needs a lot of debugging. And there is a lot of third party support that you are not going to find with your homegrown core.

      That being said, you could save a fair amount of money using an open core. But if you need to get something important out the door quickly (like a toy for christmas) you go with the commercial solution. Unless you have the necessary in-house resources to troubleshoot problems.

      Just my .02

      --
      Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
  5. Re:Hype by ajutla · · Score: 3, Funny

    But...but multiple processors are so cool! Who cares about performance when you can tell people, yeah, I have a SMP PDA right here, isn't that sexy? Heck, I imagine that this new multiprocessor core will be an excellent way to pick up chicks. I'm looking forward to its release.

  6. Wave of the future. by Willeh · · Score: 5, Interesting

    Imo this new "multiple cpu's per chip" is the way forward. And the huge power savings is an added bonus. One question springs to mind though, how much performance can you gain by using this technique? i mean, sooner or later you will hit the limits of say, the memory bus or the graphics bus or whatever(speaking in layman's terms obviously), especially in environments where power consumption is an issue, and huge memory banks take alot of power to keep them refreshed. Still, i welcome the development, smp type deals can make a computing experience easier to cope with during intensive use like compiling and other cpu intensive tasks.

    --
    Will wank off Linus Torvalds for fame.
  7. Re:Hype by cnf · · Score: 4, Insightful

    Have you never heard of Multi threading?
    On a WorkStation, I would agree with you, but on any server with thread optimised applications, more threads = more power...

    Once again, People think WorkStation, for things not designed for the WorkStation market

  8. Re:Hype by Anonymous Coward · · Score: 3, Interesting

    Unless you are talking about power consumption. Then the speed of the core increases it a lot so it makes sense to have slower processors (unless you wanna carry a huge battery pack on your back).

  9. Re:Hype by pe1rxq · · Score: 4, Insightful

    A lower core clock can save you a lot... bot financial and in energy. Raising the clock rate on a chip will increase its energy usage exponentially.
    If the problems you want to solve are parallel enough why not?

    Jeroen

    --
    Secure messaging: http://quickmsg.vreeken.net/
  10. Mod him +5 insightful by AtariAmarok · · Score: 3, Funny

    When was the last time you saw one of us admit that they had no idea what they were saying?

    --
    Don't blame Durga. I voted for Centauri.
  11. Nice to have a 4 core CPU by MrRuslan · · Score: 3, Interesting

    But what are some uses for this.If im not mistaken this is a 32 bit architecture so it has it's limits when it comes to scaling and its not powerfull inogh for one of those supercomps so whats is the target market?

  12. ARM servers by simpl3x · · Score: 5, Interesting

    Cobalt servers were originally based on ARM processors, and were for the most part really nifty. Most palmtop and cell devices also use the processors, so my question is, why don't we see more reasonable personal computers (or blades servers) based upon this architecture. People don't use the processing capacity available to them, and tuning of storage and networking often gives a better return per dollar. Somthing along the profile of the Psion Netbook or old (or new depending upon your perspective) Apple Newton (also ARM) would be very cool and useful. Give it some cellular/WiFi tech...

  13. Exactly what I was looking for! by TheLoneCabbage · · Score: 4, Funny

    Exactly what I was looking for! Finally a comuter capable of letting me balance my checkbook, use a word processor, watch a video, and browse the web!

    Is any one else getting the impression that our entire industry is driven by penis envy?

    "It's bigger, it's faster, stronger! More Power!" About the only flaw in my theory is the continuing trend of decreasing computer sizes. But I can atribute that to the fact that it lets people put them in their pockets.

    BTW: If you actully use your CPU(s), this doesn't apply to you. Your penis is bigger.

  14. I've been running SMP desktops for years... by pointbeing · · Score: 5, Informative
    The _ONLY_ reason to do this is as a last resort when you can no longer clock your existing core any higher.

    Incorrect.

    As the subject line says, I've been running SMP desktop PCs for years. My current home PC is a dual 1GHz P-III, my wife's is a dual 850 and my Linux web/file/mail/whatever server is a dual 700 with a 12% overclock.

    You can only figure on about a 40% performance increase with a dual processor desktop PC, but being able to play Quake and burn a DVD at the same time has it's advantages ;-)

    As others have mentioned, multitasking is greatly enhanced - and two midrange processors are generally cheaper than one high-end processor.

    Also, even though some applications aren't multithreaded, all modern desktop OS are - so you get a performance boost even running single-task applications. If you're into running Windows, Internet Explorer is multithreaded, as are all Microsoft Office applications. There's a real-world productivity boost using SMP machines.

    --
    we see things not as as they are, but as we are.
    -- anais nin
    1. Re:I've been running SMP desktops for years... by RevAaron · · Score: 3, Interesting

      You say "Incorrect," but the examples you provide more or less support his claim. Yes, oftentimes two lower speed CPUs are cheaper than one CPU that is twice as fast, but there isn't much of a reason to go SMP unless you cannot just get a higher clocked CPU.

      Mind you, the guy isn't saying SMP is stupid- it makes sense in a lot of situations. But, it is something you pull out when a single, higher-speed CPU is not a possibility, whether that is the case due to lack of funds or whether a faster CPU just does not exist.

      Here at work, I have a dual 500 MHz G4 which still holds it own, even with a relatively small amount of RAM, 256 MB. When this box was purchased, there was no option for a single-CPU 1 GHz box, and this is certainly the next best thing...

      --

      Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
  15. Re:Hype by eclectro · · Score: 4, Interesting


    You bring up an interesting point. The reason this might be valuable is because ARM processors are known for their low current and energy saving features.

    Almost always when you max out the clock speed on a chip the current drain rises quickly.

    From the article it can be surmised that this chip runs at a cool 2 watts running full out, and .31 Watt standby (somebody clarify this). If this holds true, it probably beats anything else at the same clock speed.

    As as aside, there are cell phones that use a dual ARM core, one doing control duty and another doing DSP work.

    --
    Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
  16. MMP ARM server by Gadzinka · · Score: 3, Insightful

    Just the other day I was thinking about "Massively Multiprocessor" ARM computer. It came to me after reading about cluster of VIA low-power computers.

    So, ARM are even lower power, they are designed quite correctly from the ground up[1] and the only thing that's missing is FPU. But the computer with 100 ARM CPUs would run faster than any ix86 today and probably would consume less power than the latest P4/K7/K8.

    Give me for 64 proc (*4 cores per proc, so 256 proc) Linux machine anytime ;)

    Robert

    [1] Anyone who knows internals of today ix86 processor from any vendor knows what a mess is it in order to use today's technology with ancient ISA like ix86.

    --
    Bastard Operator From 193.219.28.162
    1. Re:MMP ARM server by mbge7psh · · Score: 4, Informative
      Your dreams are answered - it does have floating point.

      It also features configurable level 1 caches, 64-bit AMBA AXI interfaces, vector floating-point coprocessors and programmable interrupt distribution.
  17. That's nice but, by dbretton · · Score: 4, Interesting

    Let's talk some real numbers.

    How will it fare against, say a Xeon with HT or 2 Opterons?
    How will it stack up in price?

  18. Re:Hype by TonyJohn · · Score: 5, Informative

    As Intel is now discovering (and promoting) it has long been known that clock frequency is not a sufficient measure of performance. It matters how much processing you can do in each clock tick as well as how often your clock ticks. Naturally, the faster the clock ticks, the less processing you can do per clock tick.

    1/2 GHz quoted for this core may not sound a lot, but there are some good reasons for it:

    - ARM cores use a shorter pipeline than Intel cores (in general). This requires less logic to get a good throughput of operations. Less logic means less area (less cost) and less power consumption. These are important in embedded applications (you don't want your phone to be putting out 50W and costing $200).

    - These cores are synthesisable. This means that ARM will deliver a "model" of the device, and customers can translate this to a silicon layout on their own process, and they can integrate peripherals, memory etc. on the same silicon. Getting a higher clock speed requires custom logic which is hard to translate between processes. Essentially the processor has sold separately as a piece of silicon, and this means a slow off-chip interface to the rest of the system.

    For a multi-threaded or multi-process application such as this core is targetted, using MP cores makes more sense than using a single high-speed core and switching between processes all the time. For one thing you save all the context switching overhead.

    --
    Owl tried to think of something wise to say, but couldn't.
  19. Re:Hype by BigBadBri · · Score: 4, Interesting
    No - you've missed the point of this exercise entirely.

    The purpose of having a multiprocessor on a single core is to make consumer devices (read: audiovisual stuff) more versatile, by allowing them to dedicate, say, one core to processing the signal you're watching, one to processing the signal you wish to record, one to handle the disk I/O, and one to watch over everything and make sure your favourite show is recorded without glitches.

    This isn't aimed at the desktop, or at shrinking supercomputers to the size of your thumb, or any other fantasies you may while away your idle cycles with.

    It's aimed fairly and squarely at the embedded and consumer device markets, where it will produce benefits, and will likely make ARM a tidy sum in license fees.

    --
    oh brave new world, that has such people in it!
  20. Check out PMC-Sierra's dual-core RM9000x2 by ebunga · · Score: 3, Interesting

    PMC-Sierra's MIPS-based RM9000x2GL's are really neat. It's been out for some months now. I'd love to see a machine with several dozen of these.

  21. ARM6 != ARMv6 by hattig · · Score: 3, Informative

    One is a ~1990 era version of the ARMv3 architecture (IIRC).
    The other is ARM's latest version of the ARM architecture.

    26-bit addressing limitations were removed ~14 years ago. I don't even think any of the more recent versions of the ARM architecture support it.

  22. WinCE, Symbian, PalmOS and Linux by Anonymous Coward · · Score: 4, Interesting

    This is one of the reasons why Linux will eventually win in the handheld/cell phone space. Unlike WinCE, Symbian and PalmOS, Linux already supports SMP. Linux is light years ahead of WinCE, Symbian and PalmOS on all all key core technology features such as SMP. I know for a fact that Linux is being used to validate these features on future ARM processors. So, companies that based their products on Linux won't have to worry about the OS running on the new processors. The proprietary OSes will be playing catchup forever. I will not be surprised if Microsoft has to redesign WinCE from scratch yet again to accommodate SMP.

  23. Why? by Anonymous Coward · · Score: 4, Informative
    Low power. Die size. Cost.

    You don't use an opteron in the same situation as an arc core. Its a synthesisable mini processor used for controlling real time systems. It can be embedded in chips with custom VLSI logic to provide a platform for an operating system. Its not meant for competing with Opterons or any of the other such stupid ideas.


    Why 4 cores?


    Not all customers need 4 cores, some only need 1 (washing machines) or maybe 2. The system is therefore scalable to die size/power/cost requirements. Note its configurable, it does not have to have 4 cores. If I were a customer of arc I could chose how much die space to devote to the core and how much power I really needed.

    4 cores, instead of one bigger more complex one is easier to engineer and get right. Look at modern graphics architectures, its the same principle (though one can argue about cache coherency).

    Multiple cores would make dynamic power management much easier to handle I imagine. An entire core could shut down when its process(es) are not busy. A properly designed embedded system could benefit enourmously from this power saving and the hardware design is made relatively easy rather than trying to cut voltage for on one large core.

    Embedded systems using arc cores often need to meet real time needs. One advantage of a multicore system would be to place a critical software component on a single core and, with correct use of memory, guarantee a fixed throughput rate of data. Of course I can use thread priorities but this makes things harder IMO. Maybe thats what they refer to by easier programming.


    To me, this looks like a clean idea, which although not revolutionary in terms of an idea, does provide significant advantages for embedded device designers by being synthesisable.


    Wroceng
    (no association with ARM at all but I forgot my password temporarily)

  24. Clock for clock, how is it? by Anonymous Coward · · Score: 3, Interesting

    One thing I've always wanted is a comparison of the general efficiencies of different processors. That is, if you made different types of processors the same clock speed, gave them equivalent caches, and ran a benchmark entirely out of cache, how would they all compare?

    X86s are supposedly awfully inefficient architectures, so would they come out on bottom? Where would various ARM, xScale, 68k, and PPC processors end up?

    Although x86 CPUs have scaled up to some amazing clock frequencies, it seems like their growth has slowed. Intel seems to have implicitely acknowledged this since they're dropping the P4 line for an updated P3 architecture. AMD did the same thing with the Athlon64s, which have slower clock speeds but are faster in the end.

    If it turned out that an ARM at, say, 600 MHz turned out to be as fast as a P3 at 1 GHz, then I would say the ARM could leave the embedded market and could become competition in the desktop market. If such systems were significantly cheaper, cooler, smaller, and less power hungry than similar x86 systems, I think they could seriously compete.

  25. Re:Arm != Intel ? by TonyJohn · · Score: 3, Informative

    Eeek. No.

    Intel bought part of DEC (Digital), which had, in its product portfolio, the StrongARM processor. StrongARM is a DEC implementation of the ARM Instruction Set Architecture (version 4 if you care).

    ARM is still an separate, publically listed company. XScale is an Intel implementation of the ARM ISA (version 5TE I think). Intel pays ARM to use their architecture.

    ARM also designs implementations of the ARM ISA and licences these designs to chip designers to include in System-on-Chip designs.

    --
    Owl tried to think of something wise to say, but couldn't.