Slashdot Mirror


ARM Chips Designed For 480-Core Servers

angry tapir writes "Calxeda revealed initial details about its first ARM-based server chip, designed to let companies build low-power servers with up to 480 cores. The Calxeda chip is built on a quad-core ARM processor, and low-power servers could have 120 ARM processing nodes in a 2U box. The chips will be based on ARM's Cortex-A9 processor architecture."

7 of 132 comments (clear)

  1. Going to be expensive! by ikarys · · Score: 5, Funny

    It'll likely cost an ARM and a leg.

  2. Re:is it worth it? by swalve · · Score: 3, Insightful

    Its low power in that the cores (I assume) can be shut down that aren't being used. Like a switchmode power supply versus a linear one. So you are always using the least amount of power possible.

  3. Re:is it worth it? by L4t3r4lu5 · · Score: 5, Interesting

    Cortex A9 is 250mW per core at 1GHz

    You're looking at, for a 240 core 2U node, 60W for CPUs. Pretty impressive.

    --
    Finally had enough. Come see us over at https://soylentnews.org/
  4. Re:is it worth it? by fuzzyfuzzyfungus · · Score: 4, Interesting

    It really depends on how much(and what kind of) support hardware ends up being involved in having lots and lots of them together in some useful way. That and what inefficiencies, if any, are present because your workload was really expecting a smaller number of higher-performance cores.

    The power/performance of the core itself remains the same whether you have 1 or 1 million. The power demands of the memory may or may not change: phones and the like usually use a fairly small amount of low-power RAM in a package-on-package stack with the CPU. For server applications, something that takes DIMMS or SODIMMs might be more attractive, because PoP usually limits you in terms of quantity.

    The big server-specific questions are going to be the nature of the "fabric" across which 120 nodes in a 2U are communicating. Because 120 ports worth of 10/100 or GigE would occupy 3Us and nonzero power themselves, I'm assuming that this fabric is either not ethernet at all, or some sort of cut-down "we don't need to care about the standards because the signal only has to travel 6 inches over boards we designed, with our hardware at both ends" pseudo-ethernet that looks like an ethernet connection for compatibility purposes; but is electrically more frugal. Whatever that costs, in terms of energy, will have to be added on to the effective energy cost of the CPUs themselves.

    Then you get perhaps the most annoying variable: Many tasks are(either fundamentally, or because nobody bothered to program them to support it) basically dependent on access to a single very fast core, or to a modest number of cores with very fast access to one another's memory. For such applications, the performance of 400+ slow cores is going to be way worse than a naive addition of their individual powers would suggest. Sharing time on a fast core is both fundamentally easier, and enjoys a much longer history of development, than does dividing a task among small ones. With some workloads, that will make this box nearly useless(especially if the interconnect is slow and/or doesn't do memory access). For others, performance might be nearly as good as a naive prediction would suggest.

  5. Re:is it worth it? by somersault · · Score: 3, Interesting

    Not really, the server could stay powered up the whole time (unless you really get 0% usage at non-peak times, and those times are predictable, in which case it makes sense to just power down completely at those times). By scaling up I mean enabling more cores, thus improving the processing capacity of the server. Then you'd get the best of both worlds, with the server being fine for anything from small to massive workloads, while still using less power than the equivalent x86 setup. Like modern engines which can enable or disable cylinders at will to conserve fuel when not much power is needed.

    --
    which is totally what she said
  6. Re:And it's useless. No 64-bit support. by TheRaven64 · · Score: 4, Informative

    How about a link to this rant, if you want us to read it? And, if you've got a problem with PAE-like extensions, then I presume you're aware that both Intel's and AMD's virtualisation extensions use PAE-like addressing?

    All that PAE and LPAE do is decouple the size of the physical and virtual address spaces. This is a fairly trivial extension to existing virtual memory schemes. On any modern system, there is some mechanism for mapping from virtual to physical pages, so each application sees a 4GB private address space (on a 32-bit system) and the pages that it uses are mapped to some from physical memory. With PAE / LPAE, the only difference is that this mapping now lets you map to a larger physical address space - for example, 32-bit virtual to 36-bit physical. You see exactly the opposite of this on almost all 64-bit platforms, where you have a 64-bit virtual address space but only a 40- or 48-bit physical address space.

    The big problem with PAE was that most machines that supported it came with 32-bit peripherals and no IOMMU. This meant that the peripherals could do DMA transfers to and from the low 4GB, but not anywhere else in memory. This dramatically complicated the work that the kernel had to do, because it needed to either remap memory pages from the low 4GB and copy their contents or use bounce buffers, neither of which was good for performance (which, generally, is something that people who need more than 4GB of RAM care about).

    The advantage is that you can add more physical memory without changing the ABI. Pointers remain 32 bits, and applications are each limited to 4GB of virtual address space, but you can have multiple applications all using 4GB without needing to swap. Oh, and you also get better cache usage than with a pure 64-bit ABI, because you're not using 8 bytes to store a pointer into an address space that's much smaller than 4GB.

    By the way, I just did a quick check on a few 64-bit machines that I have accounts on. Out of about 700 processes running on these systems (one laptop, two servers, one compute node), none were using more than 4GB of virtual address space.

    --
    I am TheRaven on Soylent News
  7. Re:Cheaper way by jDeepbeep · · Score: 4, Funny

    Nah, too RISCy

    --
    Reply to That ||