ARM Unveils One-chip SMP Multiprocessor Core

ARM servers by MrIrwin · 2004-05-17 00:39 · Score: 4, Interesting

I had thought of ARM processors being the future for client devices and embedded systems.

Looks like here we are pointing at server technology.

How long before we have a 64/32/16 bit vatiable word size Thumb like architecture?

--

And if you thought that was boring you obviously havn't read my Journal ;-)

Re:ARM servers by swordboy · 2004-05-17 01:08 · Score: 3, Insightful

I think the one thing that we're all waiting for is the introduction of on-chip system memory. Currently, the cache of a high-performance processor consumes more than half of the chip area because the penalty for a cache miss is so large. For decades now, memory frequency scaling has lagged that of the microprocessor. Although there has been some great strides recently, latency is still rearing its ugly head. External DRAM is too electrically distant to remain at the heart of any high-performance system.

Once we get processor and memory combined, we'll see performance increasing by several orders of magnatude. Processor architecture will matter even less, since emulation of *any* architecture will become trivial in terms of available processing speed. Your Thumb-like prediction will most certainly pan out to some magnatude.

--

Life is the leading cause of death in America.
Re:ARM servers by Smallpond · 2004-05-17 01:17 · Score: 1

Because you have a minimum of one transistor per memory bit. So a GB of system memory requires 8 billion transistors (plus ECC). Not quite there yet on chip sizes. Latest P4 is around 180M.
Re:ARM servers by MathFox · 2004-05-17 01:21 · Score: 3, Insightful

why don't we see more reasonable personal computers (or blades servers) based upon this architecture.
I was an Acorn Archimedes user for more than 10 years (the workstation that the ARM was originally designed for) and they were great systems. Affordable, decent speed and good operating system.
Alas, they were not "PC-compatible" and at a certain time the Intel/AMD clones with Linux became much more attractive.
Somthing along the profile of the Psion Netbook or old (or new depending upon your perspective) Apple Newton (also ARM) would be very cool and useful.
Are you talking Sharp Zaurus? I'm eyeing one (If I could order them in the Netherlands...)

--
extern warranty;
main()
{
(void)warranty;
}
Re:ARM servers by MrIrwin · 2004-05-17 01:32 · Score: 2, Interesting

Of course much of the memory we require is due to inneficient software.
Look at embedded systems and you will see fresh new well thougth out solutions which have much lowwer memory requirements.
180M transistors means we could have e.g. 100Mb flash, 40Mb RAM and an ARM on the same chip.
That could do an awful lot in some apps!

--
And if you thought that was boring you obviously havn't read my Journal ;-)
Re:ARM servers by Christopher+Thomas · 2004-05-17 01:52 · Score: 4, Informative

For decades now, memory frequency scaling has lagged that of the microprocessor. Although there has been some great strides recently, latency is still rearing its ugly head. External DRAM is too electrically distant to remain at the heart of any high-performance system.

Once we get processor and memory combined, we'll see performance increasing by several orders of magnatude.

This idea has been around for what is almost certainly longer than either of us have been alive. It turns out that there are problems.

The main problem is that no matter how much memory a system has, we find ways to use it. In the time I've been using computers, memory size has gone up four orders of _magnitude_, and I'm sure the greybeards listening will top that. The processor sitting in your machine right now has more on-die memory (the cache) than, say, an early XT had, but the tasks you're running have a memory footprint too large to fit. This is the price for being able to _do_ more than you could do on that old XT.

Another problem is with the structure of memory itself. You've heard of "fast, cheap, good - pick two"? Memory is "large, fast, densely-packed - pick _one_". The reason why integrated logic/DRAM processes tend to do one or the other badly is that DRAM and logic have to optimize transistor characteristics for exactly opposite things (high "on" current for logic, low leakage current for DRAM). Among other things, this means that DRAM is either slow or very power-hungry. SRAM is bulky no matter what you do - it's the cost of playing, when you have six transistors instead of one. Any kind of large RAM array is slow no matter what you do - you have to propagate signals across a huge structure instead of a smaller one.

The solution to date has been a hierarchical cache system, where small, fast, on-die memory is accessed whenever possible, and when that overflows, larger, moderately fast, on-die memory, and when that fails, DRAM. This works amazingly well, giving you almost all of the benefits of fully on-die memory for problems that fit in cache. Problems that don't fit in cache won't fit in on-die memory, so going with an on-die implementation doesn't help for them.

Progress in improving memory response times is made in two ways. The first is to use a better cache indexing algorithm that is less suceptible to pathalogical situations. In the simpler indexing schemes, you can end up with situations where a short repeating access pattern can hammer on the same small set of cache blocks, causing cache misses even when there's plenty of space elsewhere. Higher associativity and tricks like victim caches reduce this problem. Techniques like a "preferred" block in a set reduce the time penalty for high associativity, and techniques like content-addressable memory reduce the power penalty. This is still a field of active research - build a better cache, and you get closer to a system that _acts_ as if it has all memory on-die.

The second way of improving memory subsystem performance is to use memory speculation. This involves either figuring out (or even guessing) what memory locations are going to be needed and preemptively fetching their contents, or taking a guess at the value that will be returned by a memory fetch before the real result comes in. In both cases, you're masking most of the latency of the memory access, while paying a price for failed speculations (either in higher memory _bandwidth_ required, or in power for speculated threads that have to be squashed). Build a better address and data speculation engine, and you'll again approach performance of an impossible all-on-die-and-fast system.

In summary, it turns out that putting all of the memory of a general-purpose system isn't practical now and won't be as long as requirements for memory keep increasing. However, caches already give you performance approaching this for problems tha are small enough to _fit_ in on-die memory, and cache technology is constantly being improved. This is where effort should be (and is) going.
Re:ARM servers by Christopher+Thomas · 2004-05-17 01:55 · Score: 1

Memory is "large, fast, densely-packed - pick _one_".

Got ahead of myself.

What I'm trying to say is that any large SRAM array will be slower than a small SRAM array, and neither will have very high capacity. A DRAM array has high capacity, but is horribly slow. So-called "single transistor SRAM" is actually DRAM with a cache tacked on.
Re:ARM servers by addaon · 2004-05-17 02:06 · Score: 2, Insightful

One of the technologies you'll start seeing for high-performance embedded systems (and can find now, in a few places), is core pinouts designed as the mirror image of a standard DRAM memory pinout. With this setup, a CPU can be put on one side of a four (sometimes five) layer circuit board, normally, and a DRAM chip (single chip, so about 1Gb max for most usage; no double channel) can be put directly opposite it, with vias connecting the two. The electrical connection of the signalling wires between the two is extremely good, and allows much higher speed, lower latency memory to be used.

--

I've had this sig for three days.
Re:ARM servers by mwood · 2004-05-17 02:15 · Score: 1

Yeah, my first thought was, "where do I find an under-$100 ATX motherboard that'll take one of those?"
Re:ARM servers by pedantic+bore · 2004-05-17 02:17 · Score: 2, Informative

Cobalt servers were based on MIPS, and then migrated to AMD-K6 processors.

Not that they wouldn't have worked just fine with ARM, but as far as I can tell the idea never even came up.

--
Am I part of the core demographic for Swedish Fish?
Re:ARM servers by drinkypoo · 2004-05-17 02:29 · Score: 3, Informative

First try a google for Cobalt server ARM and then try another one for Cobalt server MIPS and see how you do. Cobalt Qube and Raq up to 2 were MIPS architecture machines, not ARM.
ARM has been used in many PDAs as you say, and in Acorn/Archimedes computers. It's also in the Game Boy Advance (ARM7 I believe) and will likely be the foundation of the Dual Screen (ARM9 and ARM7 both will be in the box, if leaked specs can be believed.) Arm also begat StrongARM, and intel purchased (some level of) rights to the StrongARM II architecture, which they call XScale.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:ARM servers by drinkypoo · 2004-05-17 02:36 · Score: 1

My system has 1GB of main memory. That's simply too much to fit on any current die sizes - last I heard about the biggest you could pull off with an actual complex IC covering it (not just putting down some pads, painting it with liquid crystal and covering it with glass to make a reflective mono LCD, for example) was about 21x21mm. You're not putting 1GB of DDR on the die with my CPU any time soon, and by the time you can, 1GB will be a piffling amount. Meanwhile, there ARE chips with system memory on-core, and they call them microcontrollers. They generally have a bunch of I/O pins hanging off them that can be used/addressed even more easily than the address pins and similar of any other processor. (There are, after all, instructions to toggle bits.)
The solution is to do what microprocessor manufacturers have been doing: increase cache sizes. Include L2 on chip. Soon enough, this will be followed by L3, and of course, all cache elements have been steadily growing over time, some faster than others. AMD moved to a larger cache much earlier than intel, which claimed all along that the cache size wasn't what mattered - now look at 'em. But back to the main point: If anything, you will end up with a relatively small amount of memory on-die and the OS will have control over what uses that memory.
Amazing how all the concepts that made the amiga great are slowly creeping into the PC world today. Of course, we all knew it would happen, it was just a matter of time. Amiga made (and commodore bought) a design which almost seemed to be more than the sum of its parts because of the closeness of each part to each other part, and the system's inherent DMA-ness. But, that architecture was also limiting, because everything in the system had to be able to keep up, and now here we are with very inexpensive PCs that beat its pants off because hey, that's progress.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:ARM servers by Kazymyr · 2004-05-17 03:03 · Score: 1

Actually the latest flash memory architecture can store up to 3 bits per cell (and 1 cell=1 transistor).

--
I hadn't known there were so many idiots in the world until I started using the Internet -Stanislaw Lem
Re:ARM servers by YU+Nicks+NE+Way · 2004-05-17 03:22 · Score: 1

Not quite. Intel purchased the StrongARM rights when DEC was dismembered. XScale is a purely Intel product.

If you need evidence, then consider the issues of time to wake from idle and average power consumption in idle. StrongARM did a fantastic job of managing them, given the clock speeds at which it ran. The earlier XScale chips...well, they just did not. Bulverde (gen 3 XScale) is finally starting to get a handle on those problems. XScale was designed to scale to high clock speeds, but not to handle many of the other issues which are argumably more important in the embedded space, where battery life is king. That isn't surprising: the Intel design team which did the PXA family understood the former, but not the latter, and it took them a while to educate themselves about them.
Re:ARM servers by pantherace · 2004-05-17 03:23 · Score: 2, Informative

Actually StrongARM owes nothing to ARM (the company), as it was made by DEC when they realized that lower power could be possible by turning down voltage etc on alphas, and instead of either creating a new instruction set, or using the alpha's instruction set (the first pure 64-bit arch, which was needed in servers, but not really in ultra-low power stuff at the time.) they decided to use ARM.
In a court case between DEC & Intel which was settled, DEC sold it's fabs (I think they had one or two left) & StrongARM to Intel, with Intel to produce the next generation Alphas, and the court also barred them from buying the Alpha tech*. There is little evidence that Intel tried to fab the Alphas, before saying they couldn't. When what was left of DEC after Compaq bought them by the time of the HP merger, Compaq sold the Alpha tech (non-exclusive licence apparently to get by the court decision) to Intel.
Xscale is Intel doing what intel does best: ramping up clock speeds, and having core errors in the Processor (on PXA250 (number from memory: double check) Xscales, they ran a risk of corrupting the cache, which could only be worked around by disabling the cache, making them really slow, as a equivilent clock speed a StrongARM (even as old as those in the Newton) is faster: a tribute to the DEC engineers who knew what they were doing, but StrongARM is only ARM in name & instruction set (armv4l as I recall for StrongARM, while Xscale is armv5 as I recall)
Re:ARM servers by Anonymous Coward · 2004-05-17 05:26 · Score: 0

Actually, StrongARM is ARM instruction set 4i based. It uses the ARM 4 32 bit instructions and the JTAG hooks for boundry scan but little else. Biggest differences are lack of JTAG support for debugging the processor, and no 16 bit instructions.

XScale is based on the ARM v5t instructions. Intel addded some pseudo DSP instructions to the core, rearranged the coprocessors, changed the buses, and changed the JTAG debug support.
Re:ARM servers by John+Whitley · 2004-05-17 05:55 · Score: 1

I think the one thing that we're all waiting for is the introduction of on-chip system memory.

Sorry, I prefer cheap commodity computing. Such a scheme would be VERY expensive. An expensive crutch programmers who got a 'D-' in computer architecture because of an utter failure to comprehend the memory heirarchy. An all RAM on die scheme would be an enourmous waste of money, as vast, contiguous regions would sit idle. No one does this at any scale of computing because caching strategies get us the vast majority of the benefit of such a scheme at *much* lower cost.

It would require a technological revolution that enabled a very high-speed, low-power, non-volatile memory solution that could supplant all RAM (cache or otherwise) and possibly disk media, too. Maybe MRAM or a related technology will give us this someday, but not while current assumptions hold.

Some memory-intensive applications can be satisfied simply by employing more cache RAM to cover the active memory footprint. Likewise, programmers already optimize algorithms for large memory footprints to take advantage of the memory heirarchy. See articles online about the introduction of tile-based processing to the GIMP some years back for an example of this.

It might seem nice to solve the Lazy Programmer's memory problems via scads of on-die memory, but this would help only a tiny fraction of apps. Most programs exhibit high memory reference locality without any special effort by the programmer. A much better general purpose solution (from a programmer's perspective) would be a migration to an Aspect Oriented Programming platform. Dig around for some of the papers on Aspect Oriented Programming and image processing applications. These early examples demonstrated an environment where optimization was specified seperately (orthogonally) to the image processing algos. I.e. the image processing algos could be understood without any of the code-mutation that optimization induces. Likewise, the optimization could be described on its own merits without getting "lost" in the noise of the particular problem being optimized. Done right, even the optimizations become separate, reusable components.
Re:ARM servers by mrchaotica · 2004-05-17 06:06 · Score: 1

Doesn't 100Mb flash using 180M transistors work out to 1.8 transistors/byte? I'm still just a student, but according to my intro to ECE class, even storing one bit takes more than 1.8 transistors...

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:ARM servers by theCat · 2004-05-17 06:34 · Score: 2, Interesting

Nicely done. You write textbooks, I hope?

Though not a _personal_ computer, I like many at the time ran programs in time-share environments at a university. When I started college in 1977 I got an account on the PDP-11/45 system, which came with some personal storage, access to BASIC, and all of 8K of core. Before that I had never touched a computer system. When I started serious projects I applied for more core, and got 16K.

Later as a graduate student I programmed Apple][ systems in hybrid BASIC/assembly to do "high-speed" statistics against memory-resident databases I designed myself. Those boxen had around 64K addressable system memory for programs you load from disk and another 64K you could use with high-memory tricks to store your data or routines. That would be using hi-mem as something of an L1 cache against the floppy disk subsystem, as it were. Coincidently (or not) 64K is about the amount of L1 cache of the processor I'm using right now.

In some ways, we've grown sloppy about RAM to the point nobody noticed that RAM became to the modern CPU what a floppy was to an Apple][; a slow but neccesary storage medium that acts as a loading point for the area of memory where the actual work is done. An Apple]['s hi-mem was several orders of magnitude faster than reading data off the floppy. As it is for L1 cache against system DRAM.

Today programmers are re-learning to write for 64K of memory (L1 cache) and treating board-level DRAM as "storage". This is being treated as an emerging technology triumph which it probably is, but really the challenge has been around a very long time.

--
=^..^= all your rodent are belong to us
Re:ARM servers by default+luser · 2004-05-17 07:10 · Score: 2, Informative

Non-volatile ram is a different concept, you'll probably want to steer clear for the purposes of this discussion.

You're probably thinking of SRAM, in which a single bit cell requires 6 transistors. The advantages of SRAM:

- Data remains resident as long as the cell remains powered.

- With the exception of leakage, the only power required is for switching, making SRAM good for low-power applications.

That said, a single DRAM bit is about as simple as you can get. It consists of a single transistor and a capacitor to hold the data. The disadvantages of DRAM:

- Data degrades over time, requiring periodic refresh.

- The data contained by the capacitor is also destroyed on read, requiring it to be re-written.

- Due to their design, DRAM cells have inherently slower performance (although there are tricks to improve this).

This issues make your memory interface more complex and power-hungry, but the space savings is often worthwhile to go with embedded DRAM over SRAM.

--
Man is the animal that laughs.
And occasionally whores for Karma.
Re:ARM servers by MrIrwin · 2004-05-17 07:29 · Score: 1

" Doesn't 100Mb flash using 180M transistors work out to 1.8 transistors/byte?"
Yep. Of course I was reffering to analog EEPROM cells, such as those that were used in the ISD audio recording chips ;-)

--
And if you thought that was boring you obviously havn't read my Journal ;-)
Re:ARM servers by rthille · 2004-05-17 08:54 · Score: 1

First Cobalt Qube (2700) was based on MIPS, not ARM. Perhaps you're speaking of an in-house prototype that never saw the light of day?

--
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
Re:ARM servers by leandrod · 2004-05-17 08:57 · Score: 1

> Cobalt servers were originally based on ARM processors

You are probably thinking about the NetWinder.

--
Leandro GuimarÃ£es Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
Re:ARM servers by corngrower · 2004-05-17 09:20 · Score: 1

Apparantly the parent hasn't heard of chips like the atmel AVR, or PIC chips. The concept of on-chip memory (both program and RAM) is HUGELY popular among embedded microcontrollers. These can be incredibly inexpensive as well.
Re:ARM servers by Anonymous Coward · 2004-05-17 23:57 · Score: 0

Wow, clever idea ! Which CPUs have this pinout ?
Re:ARM servers by po8 · 2004-05-18 03:24 · Score: 1

Aside from the fact that Cobalts were actually based on MIPS parts, as several other posters have noted...

The generally accepted reason for the failure of ARM to move up-scale so far is that the main company producing high-end ARMs these days has been Intel. Oddly, they seem to have issues with creating a competitor to their flagship CPUs, so they keep leaving the FPU off the part. In 2004, a CPU sans FPU is a pretty unlikely desktop box.

I'm very excited about the ARM Ltd part for this reason. Not only an FPU: a vector FPU! Build me a board, and I'll buy one tomorrow.

Imagine a.... by System.out.println() · 2004-05-17 00:40 · Score: 3, Funny

..... .....

What do you want, a cookie?

Seriously though, this would be great to run Linux on... Like a new Zaurus perhaps :)

--
I've got more mod points and GMail invi

Interesting by INeededALogin · 2004-05-17 00:40 · Score: 5, Interesting

The MPCore multiprocessor enables system designers to view the core as a single "uniprocessor", simplifying development and reducing time-to-market, according to ARM.

The opposite of HyperThreading? 4 CPU's to one instead of 1 CPU to 2?

The only thing that I can guess they mean by simplifying is that a developer would not have to design a multi-threaded application to take advantage of the other threads.

Re:Interesting by Tune · 2004-05-17 00:55 · Score: 4, Informative

It appears to be similar to other dual core technologies except developers need to worry less about threads accessing the same data. This is accomplished by cache snooping, which is a dated, but very fast way to avoid (L0) cache inconsistencies. That should take care of a major hurdle wrt. keeping SMP threads busy, especially if the clock speeds are relatively low.

Notice that SMP has been a dream to the ARM team from its early Acorn/Archimedes days on. It seems they finally got it working...
Re:Interesting by iapetus · 2004-05-17 01:19 · Score: 1

Ah, brings back memories of the Hydra processor board for the Risc PC (which was never actually available, was it?) - I always felt Shiva would have been a more appropriate name, though, for obvious reasons...

--
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
Re:Interesting by clacke · 2004-05-17 01:32 · Score: 1

By "developer" they probable mean a hardware develeoper, not a software developer. I can definitely see how one chip would be easier to deal with than four.
Re:Interesting by BigBadBri · 2004-05-17 01:45 · Score: 1

But Shiva was already (I think - I could be mistaken) being used by a firm making remote access platforms (read: big boxes of modems).
Being Cantabrigian, they probably preferred the Greek metaphor, anyway ;)

--
oh brave new world, that has such people in it!
Re:Interesting by addaon · 2004-05-17 02:09 · Score: 1

(Didn't read the article.) Are they cache snooping, which would be the obvious thing, or just using a shared cache and ignoring the problem that way?

--

I've had this sig for three days.
Re:Interesting by iapetus · 2004-05-17 02:36 · Score: 1

It's not quite as neat, though. Hydra had many heads, Shiva had many ARMs. The videogaming geek in me is desperate to suggest Goro as an alternate name if Shiva was already gone. :)

--
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
Re:Interesting by twem2 · 2004-05-17 07:11 · Score: 1

I remember after Acorn's demise hearing Dave the engineer who used to do lots of talking with the enthusiast community remember Hydra as another reason to kill Chris Cox(?) (the ex-manager).
The hardware was there, developed by someone else, but Acorn wouldn't let the work be done of RISC OS to make it preemptively multitasking (they were probably throwing away money of Digital TVs or developing applications which they then just threw away because they couldn't be bothered to even try marketing them).
Hence, RISC OS still has cooperative multitasking and is way out of date (despite still being blindingly fast for everyday use)

If I had no Idea what I was talking about... by Kjuib · 2004-05-17 00:40 · Score: 0, Insightful

(and some say I don't) but this article looks like Alphabet Soup! with all the acronyms and all. Very Interesting topic - not for the Noob.

--
- Your stupidity got you into this mess, why can't it get you out? -Will Rogers

Re:If I had no Idea what I was talking about... by ezzzD55J · 2004-05-17 00:46 · Score: 1, Funny

Good lord, Insightful? What's the insightful part, the fact that he says he doesn't know what he's talking about?

Synthesizable = can put it in an FPGA by Anonymous Coward · 2004-05-17 00:47 · Score: 5, Interesting

In case you were wondering what that is all about...

Synthesis of a core is analagous to compiling your software- except in an FPGA it is processing a hardware definition language like VHDL or Verilog to create the 'code' used to load the FPGA.

This is a big plus for people wanting to put a wicked fast processing unit in the core along with whatever custom IO goodies they can come up with.

Too bad its not open source, as there are other wicked fast processor cores available. For example Xilinx can license you to put a PowerPC in its FPGA cores.

Re:Synthesizable = can put it in an FPGA by Anonymous Coward · 2004-05-17 01:41 · Score: 0

The PowerPC cores used by Xilinx are not implemented in the FPGA fabric. They are hard cores put in with the regular FPGA fabric. It is unlikely that you would get a "wicked" fast core synthesized into the FPGA fabric (even the hard cores are only running at 300-400 MHz).
Re:Synthesizable = can put it in an FPGA by NoMercy · 2004-05-17 01:57 · Score: 3, Informative

I'm not sure how to tell you this, but youre virtually totally wrong one very point.

Synthisiable to Silicon, for ASIC's mostly though people like Philips turn them into micro-controllers and Intel make a few Micro-processors, the idea mostly is you can put a LCD controller, SIM Card reader, DSP, etc all on one lump of silicon with an ARM processor and put it in your mobile phone.

And you don't licence a PowerPC core to put in a FPGA, you get a PowerPC chip actually inside the FPGA (Vertex2 Pro), any IP-Cores you see in the core-gen are simply the hooks into these devices that are already there, similar to the GCM's.

And the big plus of this... well I don't really know but depending on how much number crunching it can do, and how much heat it generates when it does it, it could see all manner of applications.
Re:Synthesizable = can put it in an FPGA by eclectro · 2004-05-17 01:58 · Score: 4, Informative

Too bad its not open source, as there are other wicked fast processor cores available. For example Xilinx can license you to put a PowerPC in its FPGA cores.

There is this.

You can find the code easily. There are a couple of other clones, but I have not heard much about them. Another one is BlackARM developed in Sweden a couple of years ago.

I think these projects would be ok as long as they are instruction compatible, but not an internal clone. In which case ARM would pull out their lawyer dogs.

But there are a couple of other open source cores available, which IMHO would be smarter to use because you could do more with them without the fear of legal reprisal from ARM.

If you are designing an embedded system, you might could get by using such a core. The thing ARM has going for it is that commercial support and toolkits are available, which can be handy if you have a complex application that needs a lot of debugging. And there is a lot of third party support that you are not going to find with your homegrown core.

That being said, you could save a fair amount of money using an open core. But if you need to get something important out the door quickly (like a toy for christmas) you go with the commercial solution. Unless you have the necessary in-house resources to troubleshoot problems.

Just my .02

--
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Re:Synthesizable = can put it in an FPGA by NoMercy · 2004-05-17 02:02 · Score: 2, Informative

Doh, that's what you get for modifying posts to much, You can put it on a FPGA, but you wouln't want to outside of development, if you look at the picture on the article, there's 2500 dolars worth of FPGA there, and the whole unit, probably looking at 10,000, and it's a tad big, put it on the intended final target, a silicon chip and youve got something which will fit in the tiny space behind the battery in your mobile phone.
Re:Synthesizable = can put it in an FPGA by Anonymous Coward · 2004-05-17 02:17 · Score: 0

Altera's NIOS runs in a couple of dollars worth of FPGA (in volume/ low cost FPGA's) and NIOSII (released at the same show 32 bit RISC with extensible custom instruction architecture) kicks the crap out of any shit Xilinx offer in reconfig cores (Micro'Blaze' - with its crappy non standard DRAM interface). 200 DMIPS in StratixII. And hey! you can run as many as you like in an FPGA. And they were first with a hard ARM922T core in an FPGA the Excalibur. I don't understand why Xilinx get all the press when they are clearly NOT the technological leaders in the field... just goes to show the power of their marketing dept I guess...
Re:Synthesizable = can put it in an FPGA by Anonymous Coward · 2004-05-17 02:18 · Score: 0

http://www.opencores.org/
Re:Synthesizable = can put it in an FPGA by msgmonkey · 2004-05-17 03:15 · Score: 1

Being instruction set compatible is the problem since the ARM instruction set is patented.
Re:Synthesizable = can put it in an FPGA by eclectro · 2004-05-17 08:00 · Score: 1

That is debatable.

From the sounds of it, Arm found a way to make this go away

It probably is academic though. Any significant competitor to ARM that used their instructions would bring a lawsuit.

--
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Re:Synthesizable = can put it in an FPGA by Anonymous Coward · 2004-05-17 08:16 · Score: 0

Microblaze vs NIOS(II) debates are utterly stupid since they both have their strengths and weaknesses. Besides, more featureful cores are availible at zero cost anyway.
Re:Synthesizable = can put it in an FPGA by svirre · 2004-05-17 10:10 · Score: 1

The power-pc cores you can use in the Vertex 2 pro devices are not soft (synthesizable) cores. They exist as fixed cells on the FPGA array.

That a core is synthesizable does not just mean you can place it in a FPGA. It also means it can be targeted for any logic process.

I've frequently used soft cores in conjuction with my own logic ON asic/assp devices.

The advantage of soft cores are that they are easily retargetable to new technologies and is easier to integrate in a design as you don't need to make special provisions in floorplanning to fit it.
Re:Synthesizable = can put it in an FPGA by Anonymous Coward · 2004-05-17 13:45 · Score: 0

A Chinese graduate student did make an ARM compatible open source core and was quickly visited by ARM lawyers. If anyone is interested in what other open source cores are available, www.opencores.org is a good source.

Re:Hype by ajutla · 2004-05-17 00:49 · Score: 3, Funny

But...but multiple processors are so cool! Who cares about performance when you can tell people, yeah, I have a SMP PDA right here, isn't that sexy? Heck, I imagine that this new multiprocessor core will be an excellent way to pick up chicks. I'm looking forward to its release.

Wave of the future. by Willeh · 2004-05-17 00:50 · Score: 5, Interesting

Imo this new "multiple cpu's per chip" is the way forward. And the huge power savings is an added bonus. One question springs to mind though, how much performance can you gain by using this technique? i mean, sooner or later you will hit the limits of say, the memory bus or the graphics bus or whatever(speaking in layman's terms obviously), especially in environments where power consumption is an issue, and huge memory banks take alot of power to keep them refreshed. Still, i welcome the development, smp type deals can make a computing experience easier to cope with during intensive use like compiling and other cpu intensive tasks.

--
Will wank off Linus Torvalds for fame.

Re:Wave of the future. by pe1rxq · 2004-05-17 00:56 · Score: 2, Interesting

There are a lot of things right now where the cpu is the bottleneck. In making a system better it is wise to start with the weakest link and than with the second weakest, etc...
Also you don't have to refresh static ram, its more expensive but might pay off in terms of energy.

Jeroen

--
Secure messaging: http://quickmsg.vreeken.net/
Re:Wave of the future. by Jeff+DeMaagd · 2004-05-17 01:30 · Score: 1

It might depend on the type of SRAM but IIRC, SRAM is much faster, much more expensive and takes more power.
Re:Wave of the future. by drinkypoo · 2004-05-17 02:42 · Score: 1

SRAM is also much bigger. DRAM bits are just capacitors with calculated (and designed-in) characteristics coupled with a single gate. SRAM bits are flip-flops. Note that they are each (at least of the three or four types on that page) made of eight gates. That's a lot of real estate for a single bit, I'm guessing two to four times as much (depending on the size of the capacitor...)

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Wave of the future. by default+luser · 2004-05-17 09:14 · Score: 1

IIRC, SRAM is much faster, much more expensive and takes more power.

In terms of power, I would think it depends entirely on the duty cycle. In terms of switching power, SRAM has a higher switching cost due to having more transistors. On the other hand, DRAM leaks power constantly AND has to have data restored on every read while SRAM has very low leakage.

--
Man is the animal that laughs.
And occasionally whores for Karma.

Now I know.... by CdBee · 2004-05-17 00:50 · Score: 0, Offtopic

...how to make my Grendel Cluster!

--
I have been a user for about 10 years. This ends Feb 2014. The site's been ruined. I'm off. Dice, FU

Re:Hype by cnf · 2004-05-17 00:50 · Score: 4, Insightful

Have you never heard of Multi threading?
On a WorkStation, I would agree with you, but on any server with thread optimised applications, more threads = more power...

Once again, People think WorkStation, for things not designed for the WorkStation market

Re:Hype by gl4ss · 2004-05-17 00:52 · Score: 2, Insightful

go buy an intel celeron cpu then if MHZ is the only thing that matters..

arm cpu's being used mainly in devices with limited electrical power available anyways.. if this gets them more processing power per watt then all the better.

--
world was created 5 seconds before this post as it is.

Re:Hype by Anonymous Coward · 2004-05-17 00:53 · Score: 3, Interesting

Unless you are talking about power consumption. Then the speed of the core increases it a lot so it makes sense to have slower processors (unless you wanna carry a huge battery pack on your back).

Re:Hype by pe1rxq · 2004-05-17 00:53 · Score: 4, Insightful

A lower core clock can save you a lot... bot financial and in energy. Raising the clock rate on a chip will increase its energy usage exponentially.
If the problems you want to solve are parallel enough why not?

Jeroen

--
Secure messaging: http://quickmsg.vreeken.net/

Mod him +5 insightful by AtariAmarok · 2004-05-17 00:55 · Score: 3, Funny

When was the last time you saw one of us admit that they had no idea what they were saying?

--
Don't blame Durga. I voted for Centauri.

Nice to have a 4 core CPU by MrRuslan · 2004-05-17 00:56 · Score: 3, Interesting

But what are some uses for this.If im not mistaken this is a 32 bit architecture so it has it's limits when it comes to scaling and its not powerfull inogh for one of those supercomps so whats is the target market?

Re:Nice to have a 4 core CPU by pe1rxq · 2004-05-17 01:00 · Score: 1

Not all problems are solved easier by throwing more bits at it.... With more bits the number of instructions you can execute is still the same.

Jeroen

--
Secure messaging: http://quickmsg.vreeken.net/
Re:Nice to have a 4 core CPU by Anonymous Coward · 2004-05-17 01:32 · Score: 1, Interesting

If somebody was smart, they'd sell a mini-PC with this as the core. 4 (or just 2) CPUs + decent I/O subsystem = Awesome response times = Average consumer will swear it's faster than those Puntium64 thingies.
Re:Nice to have a 4 core CPU by drinkypoo · 2004-05-17 02:50 · Score: 2, Insightful

First, the desktop users will not exceed the limits of 32 bit computing for quite some time now, unless you are trying to implement operating systems where everything is mapped into a flat address space, like every location of every storage device. I don't know too many people with more than 1GB of memory in a system they ordinarily use, and a 32 bit system can successfully address 4GB. Of course not all of that can be system memory, but the point is, we're a ways away from needing 64 bit on the desktop.
Second, not every server needs a gigantic address space, but could still benefit from additional CPU power. A small but very active database server, for example. There are plenty of 32 bit SMP systems out in the real world doing real work right now, after all.
Finally, embedded systems are frequently still 8 bit, commonly 16 bit, and only recently has 32 bit become common as the more recent low-power designs have been released. Using a more powerful processor reduces your development time because you only have to write assembler for tightly timed and very short loops, and you can just throw high-level functions around.
Finally, a processor like this would be excellent for the console video game market, although it doesn't look like ARM is going to have a chance to supply it. However, a two-core version could easily end up in a handheld - people are always finding new ways to consume more and more CPU time on handheld devices. A handheld with IEEE1394 and high quality video output might be the ultimate multimedia device. And I personally dream of having a single device which fits in my pocket and performs the duties of a (basic) laptop, a cellular phone, and a PDA. If you had a little pocket projector, and it took 1394, plus it had one of those laser-scanning rangefinding keyboards in it, you could do some real work. I'm not sure about the pointing device though, maybe you could have a laser pointer that you aimed at the projected display and hit a button to change from red to green, which could be picked up by the device. I mean, it's going to be a camera phone too, right?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Nice to have a 4 core CPU by julesh · 2004-05-17 06:13 · Score: 1

I don't know too many people with more than 1GB of memory in a system they ordinarily use, and a 32 bit system can successfully address 4GB. Of course not all of that can be system memory, but the point is, we're a ways away from needing 64 bit on the desktop.

That's only 3 years away at Moore 1.

(I hereby define "Moore" to be the scale upon which the growth of computation power can be measured. Moore 1 represents a doubling every 18 months, Moore 2 a doubling every 9 months, Moore 0.5 a doubling every 36 months, etc.)
Re:Nice to have a 4 core CPU by Anonymous Coward · 2004-05-17 09:06 · Score: 0

First, the desktop users will not exceed the limits of 32 bit computing for quite some time now, unless you are trying to implement operating systems where everything is mapped into a flat address space, like every location of every storage device. I don't know too many people with more than 1GB of memory in a system they ordinarily use, and a 32 bit system can successfully address 4GB. Of course not all of that can be system memory, but the point is, we're a ways away from needing 64 bit on the desktop.

The fact that more and more folks are getting systems with 1GB of memory, when 4GB is the limit should give you a clue that it's time to move to 64bit addressing. Only a few years likely seperates 1GB chips from 4GB chips.
Re:Nice to have a 4 core CPU by drinkypoo · 2004-05-17 09:57 · Score: 1

I'm sorry, but your comment did not compile, as your define trailed your usage :)
In three years, this design will only be used in embedded devices. It's only interesting for desktop use right now, and then only if you get four 550MHz cores. Otherwise a single higher-speed processor is going to beat its pants off.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Nice to have a 4 core CPU by Dienyddio · 2004-05-17 19:05 · Score: 1

Ask Simtec what their plans are they already offer a number of ARM development boards.
Other companies offering ARM based computers include Ionix and MicroDigital. If you have the cash anything is possible...

ARM servers by simpl3x · 2004-05-17 00:59 · Score: 5, Interesting

Cobalt servers were originally based on ARM processors, and were for the most part really nifty. Most palmtop and cell devices also use the processors, so my question is, why don't we see more reasonable personal computers (or blades servers) based upon this architecture. People don't use the processing capacity available to them, and tuning of storage and networking often gives a better return per dollar. Somthing along the profile of the Psion Netbook or old (or new depending upon your perspective) Apple Newton (also ARM) would be very cool and useful. Give it some cellular/WiFi tech...

Exactly what I was looking for! by TheLoneCabbage · 2004-05-17 01:06 · Score: 4, Funny

Exactly what I was looking for! Finally a comuter capable of letting me balance my checkbook, use a word processor, watch a video, and browse the web!

Is any one else getting the impression that our entire industry is driven by penis envy?

"It's bigger, it's faster, stronger! More Power!" About the only flaw in my theory is the continuing trend of decreasing computer sizes. But I can atribute that to the fact that it lets people put them in their pockets.

BTW: If you actully use your CPU(s), this doesn't apply to you. Your penis is bigger.

--
I would rather be ashes than dust!

Re:Exactly what I was looking for! by Anonymous Coward · 2004-05-17 01:17 · Score: 0

BTW: If you actully use your CPU(s), this doesn't apply to you. Your penis is bigger.

holy crap! so I was'nt just groing a third leg!

that explains where my penis went to and why this leg leaks at the end.. and why the sales lady at shoe-carnival smacked me when I was asking for a shoe for it....

I have 4 SMP machines around me that I use heavily on a daily basis... and there is the cluster we built that is completely made up of SMP machines for our soon to start movie render....

yes there are LOTS of us that actually use SMP and cant live without it...

although a SMP machine will feel hella faster to a regular user also...
Re:Exactly what I was looking for! by TheLoneCabbage · 2004-05-17 01:22 · Score: 2, Insightful

No doubt their are people who need this kinda raw power. Rendering a movie is a good example. ...

But 99% of the people out there (and 99% of the software) can't really take advantage of that kind of power.

But darned if they don't HAVE to have the latest thing on the market. Like spending 4 times as much for bleeding edge equipment will keep their computers from becoming "out-dated". And I'm not just talking early adopters, I'm talking GrandMothers and Young Nerdlings.

People pay outrageous amounts for equipment they will never use (no I'm not talking about home gyms).

--
I would rather be ashes than dust!
Re:Exactly what I was looking for! by (mandos) · 2004-05-17 07:26 · Score: 1

I think this follows beyond just computers. From what I've seen most SUV owners think the same way. I rarely see someone with an SUV doing anything other then commuting to and from work, or at the grocery store taking up 2 parking spaces.

Mike Scanlon
Re:Exactly what I was looking for! by lostchicken · 2004-05-17 09:01 · Score: 1

I use my computer to decompress a complex audio compression format at pretty high bit rates while browsing the web, downloading files and having an IM conversation all the time. That's quite a bit of math to do. Guess what, it's still sluggish for some tasks. If I'm using photoshop, filters run slower if I'm playing music. I want music playing. I also want a fast computer.

That's what SMP gives you. A single CPU can easily do anything I want, but by partitioning it, my Vorbis player doesn't slow my AutoCAD session. Better than a single CPU twice as fast, actually, because one CPU can service the interrupts for stuff like mouse, keyboard, network traffic for people accessing files on my computer, and iTunes, leaving my other CPU to do just the heavy lifting. Then, when I am doing something that really uses the CPU, I can still use the computer without it being sluggish. Even if my single CPU is twice as fast, that only means that my computer is unusable for 45 seconds instead of 90.

--
-twb
Re:Exactly what I was looking for! by eggsome · 2004-05-17 18:05 · Score: 1

Young Nerdlings

Woha! I just had this image in my head like a cabbagepatch kids(tm) field full of little nerd heads with little shirt pen holders.
Man I gotta get some sleep - 24hrs without sleep does funny things to your imagination. :-\

--
If they made a movie of your life, would anybody buy a ticket?

I've been running SMP desktops for years... by pointbeing · 2004-05-17 01:12 · Score: 5, Informative

The _ONLY_ reason to do this is as a last resort when you can no longer clock your existing core any higher.

Incorrect.

As the subject line says, I've been running SMP desktop PCs for years. My current home PC is a dual 1GHz P-III, my wife's is a dual 850 and my Linux web/file/mail/whatever server is a dual 700 with a 12% overclock.

You can only figure on about a 40% performance increase with a dual processor desktop PC, but being able to play Quake and burn a DVD at the same time has it's advantages ;-)

As others have mentioned, multitasking is greatly enhanced - and two midrange processors are generally cheaper than one high-end processor.

Also, even though some applications aren't multithreaded, all modern desktop OS are - so you get a performance boost even running single-task applications. If you're into running Windows, Internet Explorer is multithreaded, as are all Microsoft Office applications. There's a real-world productivity boost using SMP machines.

--
we see things not as as they are, but as we are.
-- anais nin

Re:I've been running SMP desktops for years... by RevAaron · 2004-05-17 02:33 · Score: 3, Interesting

You say "Incorrect," but the examples you provide more or less support his claim. Yes, oftentimes two lower speed CPUs are cheaper than one CPU that is twice as fast, but there isn't much of a reason to go SMP unless you cannot just get a higher clocked CPU.

Mind you, the guy isn't saying SMP is stupid- it makes sense in a lot of situations. But, it is something you pull out when a single, higher-speed CPU is not a possibility, whether that is the case due to lack of funds or whether a faster CPU just does not exist.

Here at work, I have a dual 500 MHz G4 which still holds it own, even with a relatively small amount of RAM, 256 MB. When this box was purchased, there was no option for a single-CPU 1 GHz box, and this is certainly the next best thing...

--

Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Re:I've been running SMP desktops for years... by DigiShaman · 2004-05-17 02:43 · Score: 1

I'm running WinXP Pro (SP1) with a P4 2.8. Because I have hyper-threading enabled, Windows shows two CPUs. So even though my computer will handle multi-tasking more efficiently, I find that parts of Windows are not SMP optimized. For example; when I do a search in the registry through REGEDIT, I only seen one CPU being maximized. Also, if I push Internet Explorer, it too only is using one CPU at a time. But one thing I've noticed is that Windows will automatically ballance thread load to CPU. By dynamically alocating processor affinity, Windows runs smooth without being bogged down regardless what I through at it. SMP is very nice!

--
Life is not for the lazy.
Re:I've been running SMP desktops for years... by pointbeing · 2004-05-17 02:58 · Score: 1

Maybe I was a bit hasty with the "incorrect".
The way I read the post was the *only* reason for SMP was if you reached an architecture limit - and I disagree with that.
Hell, it wouldn't be the first time I've misread a post here ;-)
On multiuser systems the benefits to SMP are clear. IM frequently less than HO running two processors at 50% is considerably more efficient than running one at 100%.

--
we see things not as as they are, but as we are.
-- anais nin
Re:I've been running SMP desktops for years... by TheLink · 2004-05-17 03:11 · Score: 1

Actually to have the benefits of SMP desktops you talk about, all you need is an O/S that's not so efficient at allocating lots of CPU to the CPU intensive task :).

E.g. it only allocates up to 50%, and leaves the other 50% for other stuff (GUI, DVD burning etc).

But then people will complain if only 50% of their CPU is used, oh well...

While two midrange processors are cheaper than one high end processor, SMP motherboards so far have been rather more expensive. Anyway most people can't afford the really high end stuff - and stuff like the P4 "extremely expensive" edition is for the really rich people.

That said AMD may go dual core on desktops, so people might be able to drop in dual core replacements into their single CPU socket motherboards. Probably one of the reason why AMD has the 83W rating for their AMD64 CPUs despite most of them only going up to 50+W - the cooler ones are only 30-40W.
--
- Too many replies beneath your current threshold
Re:I've been running SMP desktops for years... by Anonymous Coward · 2004-05-17 21:24 · Score: 0

Yup, same thing on Linux - threads are balanced between the CPUs in general. Great, isn't it?

Re:Hype by Anonymous Coward · 2004-05-17 01:13 · Score: 0

only fools that dont really do ANYTHING with their computer say this...

I use SMP every day. I NEED to as i use a computer for real uses not just screwing around like you...

Video editing..., CG rendering, circuit simulation, Autorouting, the list goes on....

some of us have a REAL use for computers, many are not simply appliance operators like yourself.

Re:Hype by eclectro · 2004-05-17 01:17 · Score: 4, Interesting

You bring up an interesting point. The reason this might be valuable is because ARM processors are known for their low current and energy saving features.

Almost always when you max out the clock speed on a chip the current drain rises quickly.

From the article it can be surmised that this chip runs at a cool 2 watts running full out, and .31 Watt standby (somebody clarify this). If this holds true, it probably beats anything else at the same clock speed.

As as aside, there are cell phones that use a dual ARM core, one doing control duty and another doing DSP work.

--
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"

MMP ARM server by Gadzinka · 2004-05-17 01:18 · Score: 3, Insightful

Just the other day I was thinking about "Massively Multiprocessor" ARM computer. It came to me after reading about cluster of VIA low-power computers.

So, ARM are even lower power, they are designed quite correctly from the ground up[1] and the only thing that's missing is FPU. But the computer with 100 ARM CPUs would run faster than any ix86 today and probably would consume less power than the latest P4/K7/K8.

Give me for 64 proc (*4 cores per proc, so 256 proc) Linux machine anytime ;)

Robert

[1] Anyone who knows internals of today ix86 processor from any vendor knows what a mess is it in order to use today's technology with ancient ISA like ix86.

--
Bastard Operator From 193.219.28.162

Re:MMP ARM server by mbge7psh · 2004-05-17 01:29 · Score: 4, Informative

Your dreams are answered - it does have floating point.
It also features configurable level 1 caches, 64-bit AMBA AXI interfaces, vector floating-point coprocessors and programmable interrupt distribution.
Re:MMP ARM server by kunudo · 2004-05-17 11:49 · Score: 1

It came to me after reading about cluster of VIA low-power computers.

this one? :)

Re:Hype by Anonymous Coward · 2004-05-17 01:20 · Score: 0

> Heck, I imagine that this new multiprocessor core will be an excellent way to pick up chicks

So, that's why the iPod is so successful! I thought it was the looks, but it's what's inside that counts

That's nice but, by dbretton · 2004-05-17 01:24 · Score: 4, Interesting

Let's talk some real numbers.

How will it fare against, say a Xeon with HT or 2 Opterons?
How will it stack up in price?

Re:That's nice but, by Anonymous Coward · 2004-05-17 01:31 · Score: 1, Insightful

If you're comparing it to Xeons and Opterons, you're not even in the same market.
Re:That's nice but, by eclectro · 2004-05-17 01:36 · Score: 1

How will it fare against, say a Xeon with HT or 2 Opterons?

It won't be able to heat up your house during winter like the Xeon with HT or 2 Opterons can.

This may not be so important during the summer months though.

--
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Re:That's nice but, by mcspock · 2004-05-17 05:36 · Score: 1

Notice the "embedded" tag.

This kind of product is designed for PDAs, where most products are going to 300-400mhz processors.

--
-- Patience is a virtue, but impatience is an art.

Re:Hype by TonyJohn · 2004-05-17 01:24 · Score: 5, Informative

As Intel is now discovering (and promoting) it has long been known that clock frequency is not a sufficient measure of performance. It matters how much processing you can do in each clock tick as well as how often your clock ticks. Naturally, the faster the clock ticks, the less processing you can do per clock tick.

1/2 GHz quoted for this core may not sound a lot, but there are some good reasons for it:

- ARM cores use a shorter pipeline than Intel cores (in general). This requires less logic to get a good throughput of operations. Less logic means less area (less cost) and less power consumption. These are important in embedded applications (you don't want your phone to be putting out 50W and costing $200).

- These cores are synthesisable. This means that ARM will deliver a "model" of the device, and customers can translate this to a silicon layout on their own process, and they can integrate peripherals, memory etc. on the same silicon. Getting a higher clock speed requires custom logic which is hard to translate between processes. Essentially the processor has sold separately as a piece of silicon, and this means a slow off-chip interface to the rest of the system.

For a multi-threaded or multi-process application such as this core is targetted, using MP cores makes more sense than using a single high-speed core and switching between processes all the time. For one thing you save all the context switching overhead.

--
Owl tried to think of something wise to say, but couldn't.

Re:Hype by BigBadBri · 2004-05-17 01:40 · Score: 4, Interesting

No - you've missed the point of this exercise entirely.

The purpose of having a multiprocessor on a single core is to make consumer devices (read: audiovisual stuff) more versatile, by allowing them to dedicate, say, one core to processing the signal you're watching, one to processing the signal you wish to record, one to handle the disk I/O, and one to watch over everything and make sure your favourite show is recorded without glitches.

This isn't aimed at the desktop, or at shrinking supercomputers to the size of your thumb, or any other fantasies you may while away your idle cycles with.

It's aimed fairly and squarely at the embedded and consumer device markets, where it will produce benefits, and will likely make ARM a tidy sum in license fees.

--
oh brave new world, that has such people in it!

Chip Multiprocessors? by Anonymous Coward · 2004-05-17 01:41 · Score: 0

Chip Multiprocessors!! Another headache for programmers. check out this www.cradle.com

ARM6 *NOT* a server chip by Tune · 2004-05-17 01:43 · Score: 2, Informative

If I recall correctly, chips prior to ARM6 had register 15 (ARM's PC) designed with the upper six bits reserved for status. Having a program address space of only 2^26 = 64 MB was a major obstacle, even for (successors of) Acorn's RiscPC, a desktop model. With that resolved in the ARM6 series, it is still unable to look beyond the 4GB boundary. In the 4 way SMP servermarket this is likely to become a major pain.

So either they found a nice way to add yet more MIPS per megaherz (or per watt) to serve a higher end embedded systems or they're targetting (very) low end servers.

Check out PMC-Sierra's dual-core RM9000x2 by ebunga · 2004-05-17 01:46 · Score: 3, Interesting

PMC-Sierra's MIPS-based RM9000x2GL's are really neat. It's been out for some months now. I'd love to see a machine with several dozen of these.

Re:Hype by Anonymous Coward · 2004-05-17 01:54 · Score: 0

ahhh... finally someone gets it. This isn't aimed at the desktop market, this is aimed at where 98% of processors go, the embedded market. Its not uncommon now for many SoCs (System on a Chip) to use 2 ARM cores (ARM 7, 9, 11, etc.) and a DSP. This is ARM's aim at future embedded markets where they can see a need for up to 4 cores (switches, routers, consumer goods, and as many have said PDAs and Cell phones).

ARM6 != ARMv6 by hattig · 2004-05-17 01:56 · Score: 3, Informative

One is a ~1990 era version of the ARMv3 architecture (IIRC).
The other is ARM's latest version of the ARM architecture.

26-bit addressing limitations were removed ~14 years ago. I don't even think any of the more recent versions of the ARM architecture support it.

Re:ARM6 != ARMv6 by TonyJohn · 2004-05-17 02:38 · Score: 1

26-bit addressing... I don't even think any of the more recent versions of the ARM architecture support it.
This is correct. Currently available ARM cores do not support 26-bit addressing.

--
Owl tried to think of something wise to say, but couldn't.

WinCE, Symbian, PalmOS and Linux by Anonymous Coward · 2004-05-17 02:00 · Score: 4, Interesting

This is one of the reasons why Linux will eventually win in the handheld/cell phone space. Unlike WinCE, Symbian and PalmOS, Linux already supports SMP. Linux is light years ahead of WinCE, Symbian and PalmOS on all all key core technology features such as SMP. I know for a fact that Linux is being used to validate these features on future ARM processors. So, companies that based their products on Linux won't have to worry about the OS running on the new processors. The proprietary OSes will be playing catchup forever. I will not be surprised if Microsoft has to redesign WinCE from scratch yet again to accommodate SMP.

Why? by Anonymous Coward · 2004-05-17 02:00 · Score: 4, Informative

Low power. Die size. Cost.

You don't use an opteron in the same situation as an arc core. Its a synthesisable mini processor used for controlling real time systems. It can be embedded in chips with custom VLSI logic to provide a platform for an operating system. Its not meant for competing with Opterons or any of the other such stupid ideas.

Why 4 cores?

Not all customers need 4 cores, some only need 1 (washing machines) or maybe 2. The system is therefore scalable to die size/power/cost requirements. Note its configurable, it does not have to have 4 cores. If I were a customer of arc I could chose how much die space to devote to the core and how much power I really needed.

4 cores, instead of one bigger more complex one is easier to engineer and get right. Look at modern graphics architectures, its the same principle (though one can argue about cache coherency).

Multiple cores would make dynamic power management much easier to handle I imagine. An entire core could shut down when its process(es) are not busy. A properly designed embedded system could benefit enourmously from this power saving and the hardware design is made relatively easy rather than trying to cut voltage for on one large core.

Embedded systems using arc cores often need to meet real time needs. One advantage of a multicore system would be to place a critical software component on a single core and, with correct use of memory, guarantee a fixed throughput rate of data. Of course I can use thread priorities but this makes things harder IMO. Maybe thats what they refer to by easier programming.

To me, this looks like a clean idea, which although not revolutionary in terms of an idea, does provide significant advantages for embedded device designers by being synthesisable.

Wroceng
(no association with ARM at all but I forgot my password temporarily)

Re:Hype by Christopher+Thomas · 2004-05-17 02:05 · Score: 2, Insightful

A lower core clock can save you a lot... bot financial and in energy. Raising the clock rate on a chip will increase its energy usage exponentially.

[Rant]Why, oh _why_, do people keep horribly abusing the word "exponentially"?[/Rant]

Power goes up in direct proportion to the clock rate. This is a "linear" relation. If it was really "exponential", we'd be stuck running 10 MHz processors because anything else would melt.

For the really pedantic, the way to compute dynamic power dissipation is to figure out how much capacitance you have on nodes that are being switched, what fraction of the time they're being switched, the amount of energy required to switch a given amount of capacitance (depends on signal voltage), and the frequency, and multiply these together:

P = (1/2) * Vdd^2 * C_node * N_nodes * transitions_per_clock * f

The only thing that's _not_ linear is the power-vs-voltage relation, and that's _quadratic_. Anything "exponential" sucks a whole lot worse.

[ObDisclaimer about clock feedthrough, but that's linear with frequency and capacitance to.]

Re:Hype by addaon · 2004-05-17 02:12 · Score: 1

Parent may or may not be insightful, but is, sadly, not true. Power usage is linear with clock speed, assuming voltage is constant.

--

I've had this sig for three days.

SCU by Tune · 2004-05-17 02:20 · Score: 1

Reading the article is not required; just skimming it reveals a diagram with 4 CPU's, each with its own cache connected by arrows to a large blob called "Snoop Control Unit".

Should Be A Boon For PDA's by theManInTheYellowHat · 2004-05-17 02:30 · Score: 2, Interesting

I would imagine that a wristwatch that can do voice processing and movie rendering.

This would seem to hand in hand with the current thinking on on the fly OCR/language translation. I watched a show last night about a camera and PDA gizmo that could translate a road sign for you. I think that one did it via a server based imageing system. But if you do all that internal the posiblilites are endless, and hopefully not trivial, like SMP pong or really fancy ringtones.

low electical power + high CPU power == quick results and small size that does not require a radio flyer full of batteries.

Just a small detail by JamesP · 2004-05-17 02:31 · Score: 0, Informative

ARM SUCKS!

I already did some stuff using ARM7TDMI and I can say that it SUCKS BIG TIME.

Why? NO INTEGER DIVISION. You have a blazing fast code 90% of the time and the other 10% it's crunching the single division in your program

--
how long until /. fixes commenting on Chrome?

Re:Just a small detail by Anonymous Coward · 2004-05-17 03:58 · Score: 0

Inconvenient, yes- but a division algorithm in hardware does the same basic calculations, they just make you write it out.
Re:Just a small detail by Anonymous Coward · 2004-05-17 04:18 · Score: 1, Insightful

If integer division is too slow, then don't do integer division. Obviously I have no idea how much experience you have, but I thought it was common knowledge that integer division is always slow, even when implemented in hardware. Divisions can take many cycles to complete, and during that time, depending on the architecture, you might not be able to perform any other instructions in parallel. The end result is that doing many divisions is going to kill the performance anyways, so there's little benefit to including it in the chip.

If you must do integer divisions, you can optimize the code a little. For example, if you're dividing by powers of two, use right shifts instead of divisions. But, if you can, it's better to avoid integer division on embedded systems to begin with.
Re:Just a small detail by siliconwafer · 2004-05-17 04:49 · Score: 1

You can do integer division by simply multiplying and then shifting. Usually, it's just two instructions.
Re:Just a small detail by Anonymous Coward · 2004-05-17 05:29 · Score: 0

Could you give an example, for dividing by 3, say?
Re:Just a small detail by Anonymous Coward · 2004-05-17 05:36 · Score: 0

Neither does MIPS, SPARC, Alpha, ARC, Xtensa, SH, etc, etc.. the list goes on...
Re:Just a small detail by corngrower · 2004-05-17 06:02 · Score: 1

I know you can implement division by a series (loop) of subtract and shift instructions, but explain how one would implement integer division by a multiply and shift pair.
Re:Just a small detail by Anonymous Coward · 2004-05-17 10:09 · Score: 0

It works like this: x/3 = (x*65536/3)/65536 = (x*21845)/65536 = (x*21845)/65536. In other words, to implement x/3 on 16-bit hardware with a widening multiply (ie. instruction that computes a 32-bit product of two 16-bit numbers), you can multiply by 21845 and take the top 16 bits of the result.

Of course, I used 16 bits only for concreteness, the method works just as well with any number of bits; however a widening multiply instruction usually yields the high bits of the result in a separate register so you don't even need to shift the result register.

This transformation is fast only for division by a constant, but depending on the situation you may be able to use a table of inverses to look up what factor to multiply with, and come out ahead.

ARM designs CORE? by Anonymous Coward · 2004-05-17 02:35 · Score: 0

Good lord, I knew it was an ARM ploy all along! /game geek

Nice! by Archibald+Buttle · 2004-05-17 02:46 · Score: 2, Informative

I've been an ARM fan for many, many years, so it's great to see this development. I've always thought this kind of thing should happen with ARM chips, and that the ARM should be well suited for this kind of application.

ARM cores have a great advantage of having an incredibly low transistor count. As a result the simpler ARM chips tend to have incredibly good production yields. I don't know if that's true for the more complex ARM variants like XScale. This multi-core processor should also be an order of magnitude less complicated than a Pentium, so it too should get good yields and thus for volume production be very cheap.

However it's also always struck me that the low transistor count of ARM chips could be of use in very high performance computing applications. It is difficult to build high transistor count chips in exotic materials, but an ARM-based chip needn't have those problems. This is of course why most chips are still made on silicon.

Also the low transistor count means that even in high speed situations you shouldn't have the clock-skew problems that plague larger processors. (Clock-skew is the problem whereby it takes longer than a single clock tick for a signal to reach from one side of the processor to the other.) A good proportion of the transistors in Pentium IVs and PowerPC G5s are there to deal with that very issue.

ARM's 1st synthesizable proc? by uarch · 2004-05-17 03:14 · Score: 2, Insightful

Are you sure it's the 1st time ARM has produced a synthesizable core? (despite what the article says)

A little over a month ago I sat through a presentation by one of the guys near the top of ARM's research division...

It was a general overview of ARM's business model (it's an IP company) and products followed by some other material. During the presentation some cores were marked as synthesizable, others were marked as the opposite (I forget the specific term that was used).

To the best of my knowledge all the cores reviewed in the presentation were already released and in production.

Re:ARM's 1st synthesizable proc? by Wesley+Felter · 2004-05-17 03:29 · Score: 2, Informative

ARM has been making synthesizable cores for years. The article is just confused.
Re:ARM's 1st synthesizable proc? by TonyJohn · 2004-05-17 04:09 · Score: 2, Informative

The article is correct but misleading. This is ARM's first multiprocessor core and therefore also its first "synthesisable multiprocessor core".

--
Owl tried to think of something wise to say, but couldn't.

MIPS by mobby_6kl · 2004-05-17 03:29 · Score: 1

2600 MIPS is just a bit less than PIII 1GHz.
Would be nice to have this power in a PDA :)

memory technology by John_Sauter · 2004-05-17 03:34 · Score: 1

An excellent treatise, thank you. To respond to your challenge, the first computer I programmed was the DEC PDP-1 in 1964; it had 4096 18-bit words of memory, for 3 times 2 to the 12 power 6-bit characters. My home PC today has 1.5 gigabytes of RAM, or 1.5 times 2 to the 30 power. The quotient is more than 5 orders of magnitude.

My home PC also costs almost two orders of magnitude less than a PDP-1 did, even ignoring inflation.
John Sauter, greybeard (J_Sauter@Empire.Net)

cheapest evaluation board?? (slightly OT) by AtomicBomb · 2004-05-17 03:35 · Score: 1

I feel that experience with ARM based embedded system will be a good item on an EE student's CV. I wonder what's the most cost effective platform that I should get if I want to play with it?

Re:cheapest evaluation board?? (slightly OT) by corngrower · 2004-05-17 04:39 · Score: 1

I wonder what's the most cost effective platform that I should get if I want to play with it
Game Boy Advance, especially if you want to PLAY with it.
Re:cheapest evaluation board?? (slightly OT) by Anonymous Coward · 2004-05-17 07:08 · Score: 0

A secondhand Acorn Archimedes 310 or any Acorn after that. Call some UK schools and ask what is left of their Acorn equipment. I keep my 440 (ARM3) + the Risc PC (StrongARM) for the time being.
Simtec used to make a development board with an ARM.
Aleph 1 has something too I believe.
Millipede, Castle, Mico, Micro Digital, Risc Os + ARM will all deliver nice hits in Google.

Ernst

Arm != Intel ? by Muad · 2004-05-17 03:40 · Score: 1

Forgive me, but I thought that:

1) Intel had bought Arm
2) The Intel PXA was actually a renamed arm chip ... I guess I dreamed the whole thing up ?

--
--- "I didn't think anyone would understand it" -Prof. Bob Muller

Re:Arm != Intel ? by TonyJohn · 2004-05-17 04:04 · Score: 3, Informative

Eeek. No.

Intel bought part of DEC (Digital), which had, in its product portfolio, the StrongARM processor. StrongARM is a DEC implementation of the ARM Instruction Set Architecture (version 4 if you care).

ARM is still an separate, publically listed company. XScale is an Intel implementation of the ARM ISA (version 5TE I think). Intel pays ARM to use their architecture.

ARM also designs implementations of the ARM ISA and licences these designs to chip designers to include in System-on-Chip designs.

--
Owl tried to think of something wise to say, but couldn't.

Clock for clock, how is it? by Anonymous Coward · 2004-05-17 03:56 · Score: 3, Interesting

One thing I've always wanted is a comparison of the general efficiencies of different processors. That is, if you made different types of processors the same clock speed, gave them equivalent caches, and ran a benchmark entirely out of cache, how would they all compare?

X86s are supposedly awfully inefficient architectures, so would they come out on bottom? Where would various ARM, xScale, 68k, and PPC processors end up?

Although x86 CPUs have scaled up to some amazing clock frequencies, it seems like their growth has slowed. Intel seems to have implicitely acknowledged this since they're dropping the P4 line for an updated P3 architecture. AMD did the same thing with the Athlon64s, which have slower clock speeds but are faster in the end.

If it turned out that an ARM at, say, 600 MHz turned out to be as fast as a P3 at 1 GHz, then I would say the ARM could leave the embedded market and could become competition in the desktop market. If such systems were significantly cheaper, cooler, smaller, and less power hungry than similar x86 systems, I think they could seriously compete.

Re:Hype by corngrower · 2004-05-17 06:17 · Score: 1

Did you mean to say
"More simultaneously executing threads = more power"?

A single cpu can exeucte multiple threads, just that the cpu switches among them and executes only one at a given instant, which does not lead to higher performance.

Oops! by simpl3x · 2004-05-17 06:18 · Score: 1

As soon as you stated that, I thought, RTFA... But there wasn't one! So, I just said duh!

Yep! MIPS... But, Acorn, Now those are pretty nifty also.

Re:Hype by corngrower · 2004-05-17 06:30 · Score: 1

Note that the 2 watts is just the processor core. To make a meaningful comparison with say a pentium, you would have to include the power consumption of cache and a memory management unit as well. I would still bet that the ARM solution would come in at substantially less than an AMD or Intel solution even if after these are accounted for.

Quick and dirty integer divide. by tjwhaynes · 2004-05-17 06:49 · Score: 1

I have no idea if this is what the OP was driving at but it's not that hard to do fast divides using tables. Reciprocal lookup and multiply, then shift by the range of your reciprocal tables. So if you need a certain level of detail, say 3 decimal places, build tables of 1024/x or larger and shift the result back 10 bits after multiplication. Or at least that's how I used to do it on ARM cores when I was writing ARM assembler for fun(!).

You can also have fun with series expansions and other tricks for turning complex time consuming operations into faster, good-enough variations. It all depends on what matters to you - speed or accuracy.

Cheers,

Toby Haynes

--
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.

Flash, SRAM, and DRAM. by Christopher+Thomas · 2004-05-17 06:49 · Score: 1

Doesn't 100Mb flash using 180M transistors work out to 1.8 transistors/byte? I'm still just a student, but according to my intro to ECE class, even storing one bit takes more than 1.8 transistors...

Most modern flash memory uses multi-level storage, allowing several bits per cell (I'd known about 4 levels (2 bits), another poster mentioned 8 levels (3 bits)). Storage still only requires one transistor.

The way it works is that you have a FET with a floating gate. In "write" mode, you apply a high voltage to the non-floating gate to drive charge either in or out through the thin oxide layer separating the gate and the body. The charge on the floating gate (which is between the sense gate and the body) ends up effectively changing the transistor threshold voltage of the transistor. When the transistor is turned on by the sense gate, you get an amount of current that varies depending on the amount of charge on the floating gate.

Other types of flash memory exist. This is just one of the more common ones.

As for storing single bits, the standard SRAM cell has 6 FETs (two inverters, cross-coupled, and two readout FETs to connect the inverter outputs to a differential read/write bus). A DRAM cell, however, just has one transistor, which connects the read/write bus to a storage capacitor. Among other things, this means that DRAM reads are destructive (capacitor is discharged on to the read/write bus; this disturbance is amplified, driving the bus back to the rail voltage and re-charging the cell's storage capacitor).

or an ASIC by Anonymous Coward · 2004-05-17 08:13 · Score: 0

You can also use the same tools to put the core into an ASIC.

Another good thing about synthesizable is that you can compile it to different specs. For example, the ARM7TDMI-S (S for synthesizable) can be compiled with different instruction decode sections. You can choose a small (cheap) and slow decode or a large (expensive) and fast one. So you can pick the best one for your situation. On most cores you can also select the amount of L1 cache you want (ARM7 doesn't have a cache at all, so it is exempt). Cache is one of the largest users of die space, so being able to size it also helps you keep costs down.

Re:Hype by Anonymous Coward · 2004-05-17 13:01 · Score: 0

fucking nerds

Slashdot Mirror

ARM Unveils One-chip SMP Multiprocessor Core

145 comments