ARM Unveils Next-Gen Processor, Claims 5x Speedup
unts writes "UK chip designer ARM [Note: check out this short history of ARM chips in mobile devices contributed by an anonymous reader] today released the first details of its latest project, codenamed 'Eagle.' It has branded the new design Cortex-A15, which ARM reckons demonstrates the jump in performance from its predecessors, the A8 and A9. ARM's new chip design can scale to 16 cores, clock up to 2.5GHz, and, the company claims, deliver a 5x performance increase over the A8: 'It's like taking a desktop and putting it in your pocket,' said [VP of processor marketing — Eric Schorn], and it was clear that he considers this new design to be a pretty major shot across the bows of Intel and AMD. In case we were in any doubt, he turned the knife further: 'The exciting place for software developer graduates to go and hunt for work is no longer the desktop.'"
According to ARM's web site, there are 'Long Physical Address Extensions (LPAE)', that allow addressing 1 TiB (40 bit). The marketing schematics for the processor mentions a "Virtual 40b PA" for each CPU.
Unfortunately, the detailed A15 documentation is not available yet, so we're left to speculate over what this means. But at the same time, the supported architecture remains ARMv7 and there is no hint of any major changes on the instruction side. An easy implementation would use a MMU with 40-bit physical addresses to map this amount of memory, but the process size would remain at 4 GiB to avoid any drastic change to the programming model.
I don't know the heat dissipation figures, but I can safely say I have never yet seen an ARM processor with a heatsink. As for power consumption a quick google seems to show that an 800MHz OMAP3 draws around 750mW at full load. This new A15 core is supposedly going to have similar figures.
No, nothing at all to do with DRM. Snooping refers to checking the contents of other caches for cache coherency. Cache comes from the French, meaning hidden - it is memory that the programmer doesn't see directly, so the CPU has to act in exactly the same (programmer-visible) way as if it were not there. This is pretty simple when you have just one core, but when you have more than one it becomes difficult.
If you have two threads, on different cores, both accessing the same memory, then each will try to pull it into the memory into the cache. This is fine, as long as both are reading it. When one writes to it, the copy in the other core's cache must be updated or the two threads will have an inconsistent view of main memory. This is called cache coherency. The snoop control unit is responsible for all of the cache-to-cache communication. Because ARM cores typically live on a die with other units that share the same RAM, it is also responsible for ensuring that the caches remain consistent with modifications to RAM by the other coprocessors.
I am TheRaven on Soylent News
It will come down to, if you know the old intel address modes to things called segments, which means you have so called segments of max 4 gigs you have to juggle around. This system on assembly level was quite evil because you had to shift around with segments for code data stack and whatsoever.
The + side it offered another layer of code injection protection. But for complexity reasons it was very unpopular, and when the segment spaces became big enough most compilers just rolled one huge segmetn and placed code and data there.
For a processor designer this approach however is very elegant because they can increas the memory range ad inifnitum while keeping the register size the same and thus keeping backwards compatibility.
From a programmers point of view segments are hell because you never know when you run into the boundary set by the segment and then the shuffeling beings. Also if you have data bigger than the segment you have to press it into multiple ones.
I am not sure if I like the way arm is going there just to keep the backwards compatibility. One point in time they will have to break it to keep the power consumption low (Intel just added on top of everything the next fluff), and I guess given their current success in the mobile phone area, they shun it a little bit to roll out the next breach in backwards compatibility like they had done in the past.
The 4 GB barrier was overcome a long time ago on 32 bit systems. The reason people still think its a problem is because Microsoft decided you as a customer shouldnt be able to use more than 4 GB memory on 32-bit since Windows 2000 . The limitations are solely artificial today on Windows 32-bit but linux gladly handle any memory you toss at it.
Excellent article explaining the issue:
http://www.geoffchappell.com/viewer.htm?doc=notes/windows/license/memory.htm
I have also yet to see a benchmark where 64-bit in itself gives significant advantage outside large calculations an simulations.
HTTP/1.1 400
According to this, a typical cortex a9 core draws about 250mW. As this has a very similar architecture (still ARMv7), it should be somewhere in similar regions, maybe more, as they boosted the frequency. So I guess a 16 core version will draw something like 4W+, maybe more. Non-the-less, this is still an incredibly good figure for a web server type processor, though a little heat sink might appear.
I'm only guessing here though, based on previous figures. There is no practical data so far on the exact figures.
Surprisingly, no. Archimedes actually used an initial version of the ARM architecture with 26 bit addressing. The high bits of the program counter register were used to store the CPU status and condition flags, giving an easy way to save/restore those flags across function calls. A clever trick, but unfortunately 64Mb of code address space wasn't enough for everyone, and so ARM moved to the fully 32-bit architecture in current use. For a transitional period, ARM CPUs supported both architectures, but that time is long gone now.
Sadly, this means that modern ARMs can only run Archimedes software through software emulation. I understand that a newer version of RISC OS does exist for the 32-bit architecture, but it's not compatible with older binaries. Programs have to be recompiled for it, and if written in assembly, partially rewritten! So, no "Sibelius 7" or "Lander"...
You're an immobile computer, remember?
That means back to segmentation. That isn't a killer problem, but it is significant. In terms of how that works in modern computers, you can see it on Windows systems on Intel PAE processors. Basically the OS gets access to all the memory in the system, but it has to be divided up to be used. In the case of the Windows implementation, the kernel can get only 2GB and each application can get only 2GB. You can have multiple 2GB apps running, but they can't have more.
For an app to get more, it has to implement memory management internally. Basically it talks to Windows and gets a range of memory set up that will be paged, it then gets more RAM allocated and specifies how to page through it. Called AWE and used by a couple apps, like MSSQL. Of course that is complex on the part of the app and would be problematic if you had multiple ones running.
Also it makes task switching hit the system harder over all, because of the segmentation.
So i mean it works, don't get me wrong, I have seen servers doing it. However 64-bit is a much, much, cleaner solution both OS wise and software wise. It really is a hack when you get down to it.
I like current desktop CPUs, which have larger virtual address spaces than physical. You are right, 40-bits is fine for now. As far as I know the top end Intel CPUs only have 48-bits of address lines currently. No reason to implement all 64-bits, you wouldn't use it. However having a flat virtual memory space is something that is extremely useful. There's a reason everyone wanted to move to that with 32-bit CPUs as soon as it became feasible. We don't really want to go back to segmentation.