ARM Unveils One-chip SMP Multiprocessor Core
An anonymous reader writes "ARM Ltd. will unveil a unique multi-processor core technology, capable of up to 4-way cache coherent symmetric multi-processing (SMP) running Linux, this week at the Embedded Processor Forum in San Jose, Calif.. The "synthesizable multiprocessor" core -- a first for ARM -- is the result of a partnership with NEC Electronics announced last October, and is based on ARM's ARMv6 architecture.
ARM says its new "MPCore" multiprocessor core can be configured to contain between one and four processors delivering up to 2600 Dhrystone MIPS of aggregate performance, based on clock rates between 335 and 550 MHz."
Looks like here we are pointing at server technology.
How long before we have a 64/32/16 bit vatiable word size Thumb like architecture?
And if you thought that was boring you obviously havn't read my Journal ;-)
..... .....
:)
What do you want, a cookie?
Seriously though, this would be great to run Linux on... Like a new Zaurus perhaps
I've got more mod points and GMail invi
The MPCore multiprocessor enables system designers to view the core as a single "uniprocessor", simplifying development and reducing time-to-market, according to ARM.
The opposite of HyperThreading? 4 CPU's to one instead of 1 CPU to 2?
The only thing that I can guess they mean by simplifying is that a developer would not have to design a multi-threaded application to take advantage of the other threads.
(and some say I don't) but this article looks like Alphabet Soup! with all the acronyms and all. Very Interesting topic - not for the Noob.
- Your stupidity got you into this mess, why can't it get you out? -Will Rogers
In case you were wondering what that is all about...
Synthesis of a core is analagous to compiling your software- except in an FPGA it is processing a hardware definition language like VHDL or Verilog to create the 'code' used to load the FPGA.
This is a big plus for people wanting to put a wicked fast processing unit in the core along with whatever custom IO goodies they can come up with.
Too bad its not open source, as there are other wicked fast processor cores available. For example Xilinx can license you to put a PowerPC in its FPGA cores.
But...but multiple processors are so cool! Who cares about performance when you can tell people, yeah, I have a SMP PDA right here, isn't that sexy? Heck, I imagine that this new multiprocessor core will be an excellent way to pick up chicks. I'm looking forward to its release.
Imo this new "multiple cpu's per chip" is the way forward. And the huge power savings is an added bonus. One question springs to mind though, how much performance can you gain by using this technique? i mean, sooner or later you will hit the limits of say, the memory bus or the graphics bus or whatever(speaking in layman's terms obviously), especially in environments where power consumption is an issue, and huge memory banks take alot of power to keep them refreshed. Still, i welcome the development, smp type deals can make a computing experience easier to cope with during intensive use like compiling and other cpu intensive tasks.
Will wank off Linus Torvalds for fame.
...how to make my Grendel Cluster!
I have been a user for about 10 years. This ends Feb 2014. The site's been ruined. I'm off. Dice, FU
Have you never heard of Multi threading?
On a WorkStation, I would agree with you, but on any server with thread optimised applications, more threads = more power...
Once again, People think WorkStation, for things not designed for the WorkStation market
go buy an intel celeron cpu then if MHZ is the only thing that matters..
arm cpu's being used mainly in devices with limited electrical power available anyways.. if this gets them more processing power per watt then all the better.
world was created 5 seconds before this post as it is.
Unless you are talking about power consumption. Then the speed of the core increases it a lot so it makes sense to have slower processors (unless you wanna carry a huge battery pack on your back).
A lower core clock can save you a lot... bot financial and in energy. Raising the clock rate on a chip will increase its energy usage exponentially.
If the problems you want to solve are parallel enough why not?
Jeroen
Secure messaging: http://quickmsg.vreeken.net/
When was the last time you saw one of us admit that they had no idea what they were saying?
Don't blame Durga. I voted for Centauri.
But what are some uses for this.If im not mistaken this is a 32 bit architecture so it has it's limits when it comes to scaling and its not powerfull inogh for one of those supercomps so whats is the target market?
Cobalt servers were originally based on ARM processors, and were for the most part really nifty. Most palmtop and cell devices also use the processors, so my question is, why don't we see more reasonable personal computers (or blades servers) based upon this architecture. People don't use the processing capacity available to them, and tuning of storage and networking often gives a better return per dollar. Somthing along the profile of the Psion Netbook or old (or new depending upon your perspective) Apple Newton (also ARM) would be very cool and useful. Give it some cellular/WiFi tech...
Exactly what I was looking for! Finally a comuter capable of letting me balance my checkbook, use a word processor, watch a video, and browse the web!
Is any one else getting the impression that our entire industry is driven by penis envy?
"It's bigger, it's faster, stronger! More Power!" About the only flaw in my theory is the continuing trend of decreasing computer sizes. But I can atribute that to the fact that it lets people put them in their pockets.
BTW: If you actully use your CPU(s), this doesn't apply to you. Your penis is bigger.
I would rather be ashes than dust!
Incorrect.
As the subject line says, I've been running SMP desktop PCs for years. My current home PC is a dual 1GHz P-III, my wife's is a dual 850 and my Linux web/file/mail/whatever server is a dual 700 with a 12% overclock.
You can only figure on about a 40% performance increase with a dual processor desktop PC, but being able to play Quake and burn a DVD at the same time has it's advantages ;-)
As others have mentioned, multitasking is greatly enhanced - and two midrange processors are generally cheaper than one high-end processor.
Also, even though some applications aren't multithreaded, all modern desktop OS are - so you get a performance boost even running single-task applications. If you're into running Windows, Internet Explorer is multithreaded, as are all Microsoft Office applications. There's a real-world productivity boost using SMP machines.
we see things not as as they are, but as we are.
-- anais nin
only fools that dont really do ANYTHING with their computer say this...
I use SMP every day. I NEED to as i use a computer for real uses not just screwing around like you...
Video editing..., CG rendering, circuit simulation, Autorouting, the list goes on....
some of us have a REAL use for computers, many are not simply appliance operators like yourself.
You bring up an interesting point. The reason this might be valuable is because ARM processors are known for their low current and energy saving features.
Almost always when you max out the clock speed on a chip the current drain rises quickly.
From the article it can be surmised that this chip runs at a cool 2 watts running full out, and
As as aside, there are cell phones that use a dual ARM core, one doing control duty and another doing DSP work.
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Just the other day I was thinking about "Massively Multiprocessor" ARM computer. It came to me after reading about cluster of VIA low-power computers.
;)
So, ARM are even lower power, they are designed quite correctly from the ground up[1] and the only thing that's missing is FPU. But the computer with 100 ARM CPUs would run faster than any ix86 today and probably would consume less power than the latest P4/K7/K8.
Give me for 64 proc (*4 cores per proc, so 256 proc) Linux machine anytime
Robert
[1] Anyone who knows internals of today ix86 processor from any vendor knows what a mess is it in order to use today's technology with ancient ISA like ix86.
Bastard Operator From 193.219.28.162
> Heck, I imagine that this new multiprocessor core will be an excellent way to pick up chicks
So, that's why the iPod is so successful! I thought it was the looks, but it's what's inside that counts
Let's talk some real numbers.
How will it fare against, say a Xeon with HT or 2 Opterons?
How will it stack up in price?
As Intel is now discovering (and promoting) it has long been known that clock frequency is not a sufficient measure of performance. It matters how much processing you can do in each clock tick as well as how often your clock ticks. Naturally, the faster the clock ticks, the less processing you can do per clock tick.
1/2 GHz quoted for this core may not sound a lot, but there are some good reasons for it:
- ARM cores use a shorter pipeline than Intel cores (in general). This requires less logic to get a good throughput of operations. Less logic means less area (less cost) and less power consumption. These are important in embedded applications (you don't want your phone to be putting out 50W and costing $200).
- These cores are synthesisable. This means that ARM will deliver a "model" of the device, and customers can translate this to a silicon layout on their own process, and they can integrate peripherals, memory etc. on the same silicon. Getting a higher clock speed requires custom logic which is hard to translate between processes. Essentially the processor has sold separately as a piece of silicon, and this means a slow off-chip interface to the rest of the system.
For a multi-threaded or multi-process application such as this core is targetted, using MP cores makes more sense than using a single high-speed core and switching between processes all the time. For one thing you save all the context switching overhead.
Owl tried to think of something wise to say, but couldn't.
The purpose of having a multiprocessor on a single core is to make consumer devices (read: audiovisual stuff) more versatile, by allowing them to dedicate, say, one core to processing the signal you're watching, one to processing the signal you wish to record, one to handle the disk I/O, and one to watch over everything and make sure your favourite show is recorded without glitches.
This isn't aimed at the desktop, or at shrinking supercomputers to the size of your thumb, or any other fantasies you may while away your idle cycles with.
It's aimed fairly and squarely at the embedded and consumer device markets, where it will produce benefits, and will likely make ARM a tidy sum in license fees.
oh brave new world, that has such people in it!
Chip Multiprocessors!! Another headache for programmers. check out this www.cradle.com
If I recall correctly, chips prior to ARM6 had register 15 (ARM's PC) designed with the upper six bits reserved for status. Having a program address space of only 2^26 = 64 MB was a major obstacle, even for (successors of) Acorn's RiscPC, a desktop model. With that resolved in the ARM6 series, it is still unable to look beyond the 4GB boundary. In the 4 way SMP servermarket this is likely to become a major pain.
So either they found a nice way to add yet more MIPS per megaherz (or per watt) to serve a higher end embedded systems or they're targetting (very) low end servers.
PMC-Sierra's MIPS-based RM9000x2GL's are really neat. It's been out for some months now. I'd love to see a machine with several dozen of these.
ahhh... finally someone gets it. This isn't aimed at the desktop market, this is aimed at where 98% of processors go, the embedded market. Its not uncommon now for many SoCs (System on a Chip) to use 2 ARM cores (ARM 7, 9, 11, etc.) and a DSP. This is ARM's aim at future embedded markets where they can see a need for up to 4 cores (switches, routers, consumer goods, and as many have said PDAs and Cell phones).
One is a ~1990 era version of the ARMv3 architecture (IIRC).
The other is ARM's latest version of the ARM architecture.
26-bit addressing limitations were removed ~14 years ago. I don't even think any of the more recent versions of the ARM architecture support it.
This is one of the reasons why Linux will eventually win in the handheld/cell phone space. Unlike WinCE, Symbian and PalmOS, Linux already supports SMP. Linux is light years ahead of WinCE, Symbian and PalmOS on all all key core technology features such as SMP. I know for a fact that Linux is being used to validate these features on future ARM processors. So, companies that based their products on Linux won't have to worry about the OS running on the new processors. The proprietary OSes will be playing catchup forever. I will not be surprised if Microsoft has to redesign WinCE from scratch yet again to accommodate SMP.
You don't use an opteron in the same situation as an arc core. Its a synthesisable mini processor used for controlling real time systems. It can be embedded in chips with custom VLSI logic to provide a platform for an operating system. Its not meant for competing with Opterons or any of the other such stupid ideas.
Why 4 cores?
Not all customers need 4 cores, some only need 1 (washing machines) or maybe 2. The system is therefore scalable to die size/power/cost requirements. Note its configurable, it does not have to have 4 cores. If I were a customer of arc I could chose how much die space to devote to the core and how much power I really needed.
4 cores, instead of one bigger more complex one is easier to engineer and get right. Look at modern graphics architectures, its the same principle (though one can argue about cache coherency).
Multiple cores would make dynamic power management much easier to handle I imagine. An entire core could shut down when its process(es) are not busy. A properly designed embedded system could benefit enourmously from this power saving and the hardware design is made relatively easy rather than trying to cut voltage for on one large core.
Embedded systems using arc cores often need to meet real time needs. One advantage of a multicore system would be to place a critical software component on a single core and, with correct use of memory, guarantee a fixed throughput rate of data. Of course I can use thread priorities but this makes things harder IMO. Maybe thats what they refer to by easier programming.
To me, this looks like a clean idea, which although not revolutionary in terms of an idea, does provide significant advantages for embedded device designers by being synthesisable.
Wroceng
(no association with ARM at all but I forgot my password temporarily)
A lower core clock can save you a lot... bot financial and in energy. Raising the clock rate on a chip will increase its energy usage exponentially.
[Rant]Why, oh _why_, do people keep horribly abusing the word "exponentially"?[/Rant]
Power goes up in direct proportion to the clock rate. This is a "linear" relation. If it was really "exponential", we'd be stuck running 10 MHz processors because anything else would melt.
For the really pedantic, the way to compute dynamic power dissipation is to figure out how much capacitance you have on nodes that are being switched, what fraction of the time they're being switched, the amount of energy required to switch a given amount of capacitance (depends on signal voltage), and the frequency, and multiply these together:
P = (1/2) * Vdd^2 * C_node * N_nodes * transitions_per_clock * f
The only thing that's _not_ linear is the power-vs-voltage relation, and that's _quadratic_. Anything "exponential" sucks a whole lot worse.
[ObDisclaimer about clock feedthrough, but that's linear with frequency and capacitance to.]
Parent may or may not be insightful, but is, sadly, not true. Power usage is linear with clock speed, assuming voltage is constant.
I've had this sig for three days.
Reading the article is not required; just skimming it reveals a diagram with 4 CPU's, each with its own cache connected by arrows to a large blob called "Snoop Control Unit".
I would imagine that a wristwatch that can do voice processing and movie rendering.
This would seem to hand in hand with the current thinking on on the fly OCR/language translation. I watched a show last night about a camera and PDA gizmo that could translate a road sign for you. I think that one did it via a server based imageing system. But if you do all that internal the posiblilites are endless, and hopefully not trivial, like SMP pong or really fancy ringtones.
low electical power + high CPU power == quick results and small size that does not require a radio flyer full of batteries.
ARM SUCKS!
I already did some stuff using ARM7TDMI and I can say that it SUCKS BIG TIME.
Why? NO INTEGER DIVISION. You have a blazing fast code 90% of the time and the other 10% it's crunching the single division in your program
how long until
Good lord, I knew it was an ARM ploy all along! /game geek
I've been an ARM fan for many, many years, so it's great to see this development. I've always thought this kind of thing should happen with ARM chips, and that the ARM should be well suited for this kind of application.
ARM cores have a great advantage of having an incredibly low transistor count. As a result the simpler ARM chips tend to have incredibly good production yields. I don't know if that's true for the more complex ARM variants like XScale. This multi-core processor should also be an order of magnitude less complicated than a Pentium, so it too should get good yields and thus for volume production be very cheap.
However it's also always struck me that the low transistor count of ARM chips could be of use in very high performance computing applications. It is difficult to build high transistor count chips in exotic materials, but an ARM-based chip needn't have those problems. This is of course why most chips are still made on silicon.
Also the low transistor count means that even in high speed situations you shouldn't have the clock-skew problems that plague larger processors. (Clock-skew is the problem whereby it takes longer than a single clock tick for a signal to reach from one side of the processor to the other.) A good proportion of the transistors in Pentium IVs and PowerPC G5s are there to deal with that very issue.
Are you sure it's the 1st time ARM has produced a synthesizable core? (despite what the article says)
A little over a month ago I sat through a presentation by one of the guys near the top of ARM's research division...
It was a general overview of ARM's business model (it's an IP company) and products followed by some other material. During the presentation some cores were marked as synthesizable, others were marked as the opposite (I forget the specific term that was used).
To the best of my knowledge all the cores reviewed in the presentation were already released and in production.
2600 MIPS is just a bit less than PIII 1GHz. :)
Would be nice to have this power in a PDA
My home PC also costs almost two orders of magnitude less than a PDP-1 did, even ignoring inflation.
John Sauter, greybeard (J_Sauter@Empire.Net)
I feel that experience with ARM based embedded system will be a good item on an EE student's CV. I wonder what's the most cost effective platform that I should get if I want to play with it?
Forgive me, but I thought that:
... I guess I dreamed the whole thing up ?
1) Intel had bought Arm
2) The Intel PXA was actually a renamed arm chip
--- "I didn't think anyone would understand it" -Prof. Bob Muller
One thing I've always wanted is a comparison of the general efficiencies of different processors. That is, if you made different types of processors the same clock speed, gave them equivalent caches, and ran a benchmark entirely out of cache, how would they all compare?
X86s are supposedly awfully inefficient architectures, so would they come out on bottom? Where would various ARM, xScale, 68k, and PPC processors end up?
Although x86 CPUs have scaled up to some amazing clock frequencies, it seems like their growth has slowed. Intel seems to have implicitely acknowledged this since they're dropping the P4 line for an updated P3 architecture. AMD did the same thing with the Athlon64s, which have slower clock speeds but are faster in the end.
If it turned out that an ARM at, say, 600 MHz turned out to be as fast as a P3 at 1 GHz, then I would say the ARM could leave the embedded market and could become competition in the desktop market. If such systems were significantly cheaper, cooler, smaller, and less power hungry than similar x86 systems, I think they could seriously compete.
"More simultaneously executing threads = more power"?
A single cpu can exeucte multiple threads, just that the cpu switches among them and executes only one at a given instant, which does not lead to higher performance.
As soon as you stated that, I thought, RTFA... But there wasn't one! So, I just said duh!
Yep! MIPS... But, Acorn, Now those are pretty nifty also.
Note that the 2 watts is just the processor core. To make a meaningful comparison with say a pentium, you would have to include the power consumption of cache and a memory management unit as well. I would still bet that the ARM solution would come in at substantially less than an AMD or Intel solution even if after these are accounted for.
You can also have fun with series expansions and other tricks for turning complex time consuming operations into faster, good-enough variations. It all depends on what matters to you - speed or accuracy.
Cheers,
Toby Haynes
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
Doesn't 100Mb flash using 180M transistors work out to 1.8 transistors/byte? I'm still just a student, but according to my intro to ECE class, even storing one bit takes more than 1.8 transistors...
Most modern flash memory uses multi-level storage, allowing several bits per cell (I'd known about 4 levels (2 bits), another poster mentioned 8 levels (3 bits)). Storage still only requires one transistor.
The way it works is that you have a FET with a floating gate. In "write" mode, you apply a high voltage to the non-floating gate to drive charge either in or out through the thin oxide layer separating the gate and the body. The charge on the floating gate (which is between the sense gate and the body) ends up effectively changing the transistor threshold voltage of the transistor. When the transistor is turned on by the sense gate, you get an amount of current that varies depending on the amount of charge on the floating gate.
Other types of flash memory exist. This is just one of the more common ones.
As for storing single bits, the standard SRAM cell has 6 FETs (two inverters, cross-coupled, and two readout FETs to connect the inverter outputs to a differential read/write bus). A DRAM cell, however, just has one transistor, which connects the read/write bus to a storage capacitor. Among other things, this means that DRAM reads are destructive (capacitor is discharged on to the read/write bus; this disturbance is amplified, driving the bus back to the rail voltage and re-charging the cell's storage capacitor).
You can also use the same tools to put the core into an ASIC.
Another good thing about synthesizable is that you can compile it to different specs. For example, the ARM7TDMI-S (S for synthesizable) can be compiled with different instruction decode sections. You can choose a small (cheap) and slow decode or a large (expensive) and fast one. So you can pick the best one for your situation. On most cores you can also select the amount of L1 cache you want (ARM7 doesn't have a cache at all, so it is exempt). Cache is one of the largest users of die space, so being able to size it also helps you keep costs down.
fucking nerds