ARM Chips Designed For 480-Core Servers
angry tapir writes "Calxeda revealed initial details about its first ARM-based server chip, designed to let companies build low-power servers with up to 480 cores. The Calxeda chip is built on a quad-core ARM processor, and low-power servers could have 120 ARM processing nodes in a 2U box. The chips will be based on ARM's Cortex-A9 processor architecture."
It'll likely cost an ARM and a leg.
When you start piling all you can onto a chip the power consumption is going to naturally creep up. Once you reach a certain threshold of x chips you lose on the benefit of ARM being "low-power." Am i wrong?
ARM's Large Physical Address Extensions (LPAE) allows access to up to 1TB of memory. While I doubt applications will use this, it will allow each virtualized host on the server to use 4GB of memory.
Take a look at the PandaBoard, if you want a low-power, dual-core ARM server, although you'd have to use CF + USB for storage, not SATA. Note, however, that VirtualBox is x86-only. If you want virtualisation, you're currently pretty limited on ARM. There is a Xen port, but it's not really packaged for end users yet.
I am TheRaven on Soylent News
While not in 1U format or a lot of off the shelf NAS boxes use ARM. My LG N2R1 NAS has a 800MHz Marvell 88F6192 and runs Lenny. I won't be surprised to see some NanoITX boards out running similar hardware. Plus, I've been very impressed with how many Debian packages are available for ARMEL. While not perfect, it's the most useful Linux server I've ever had.
How about a link to this rant, if you want us to read it? And, if you've got a problem with PAE-like extensions, then I presume you're aware that both Intel's and AMD's virtualisation extensions use PAE-like addressing?
All that PAE and LPAE do is decouple the size of the physical and virtual address spaces. This is a fairly trivial extension to existing virtual memory schemes. On any modern system, there is some mechanism for mapping from virtual to physical pages, so each application sees a 4GB private address space (on a 32-bit system) and the pages that it uses are mapped to some from physical memory. With PAE / LPAE, the only difference is that this mapping now lets you map to a larger physical address space - for example, 32-bit virtual to 36-bit physical. You see exactly the opposite of this on almost all 64-bit platforms, where you have a 64-bit virtual address space but only a 40- or 48-bit physical address space.
The big problem with PAE was that most machines that supported it came with 32-bit peripherals and no IOMMU. This meant that the peripherals could do DMA transfers to and from the low 4GB, but not anywhere else in memory. This dramatically complicated the work that the kernel had to do, because it needed to either remap memory pages from the low 4GB and copy their contents or use bounce buffers, neither of which was good for performance (which, generally, is something that people who need more than 4GB of RAM care about).
The advantage is that you can add more physical memory without changing the ABI. Pointers remain 32 bits, and applications are each limited to 4GB of virtual address space, but you can have multiple applications all using 4GB without needing to swap. Oh, and you also get better cache usage than with a pure 64-bit ABI, because you're not using 8 bytes to store a pointer into an address space that's much smaller than 4GB.
By the way, I just did a quick check on a few 64-bit machines that I have accounts on. Out of about 700 processes running on these systems (one laptop, two servers, one compute node), none were using more than 4GB of virtual address space.
I am TheRaven on Soylent News
How about a link to this rant
http://blog.linuxolution.org/archives/117
64bit memory range? Each node is going to have it's own memory slot(s). 120 cores, 4 cores per node = 30 nodes. If you plan to have less than 4GB of memory in this system, how small does each stick have to be when you plug 30 in? ~128mb. Good Luck finding a bunch of DDR2/3 128MB sticks to plug into your 4GB 120 core web server. Anyway, each node needs its own local copy of the data it needs to serve up. If you web page needs ~256MB, each node is going to need the same 256MB of data duplicated, plus any extra overhead. You can't expect all 30 nodes to access the same 2-3 memory slots; that would scale like crap. This is one of the issues you get when scaling via cores. Interconnection bandwidth/latency becomes an issue and you need to use local storage to allow fully independent processing. Once you start getting up into these ranges, you're better off thinking of each node as its own computer with a fairly high speed network.
You need to watch out with them also though. The WD Sharespace I have uses a 500MHz chip which is totally inadequate for decent throughput between the 4-disk array and the GigE interface.
And I had to write my own device support into the kernel to get it running a modern OS! It came with 2.6.12!
Nah, too RISCy
Reply to That ||
His complaint basically boils down to the fact that the kernel needs to be able to map all of physical memory, and have some address space left over for memory-mapped I/O. This is a valid complaint for a kernel developer (although Linus' 'everyone who disagrees with me is an idiot' style is quite irritating), but it largely irrelevant to the issue at hand. There is nothing stopping a kernel on ARM with LPAE from using 64-bit pointers internally. You still need to translate userspace pointers, but you need to do that anyway on most architectures (on x86, context switches are insanely expensive, so typically you use a segment for the kernel and run system call handlers without changing the page tables, just making the kernel segment visible by switching to ring 0), so that code already exists in all of the relevant places in the kernel.
I am TheRaven on Soylent News
This kind of arrangement gets brought up over and over - one of the more recent examples is SiCortex, and it sucked. Having a Single System Image is always preferable to a "cluster in a box."