Slashdot Mirror


ARM In Supercomputers — 'Get Ready For the Change'

An anonymous reader writes "Commodity ARM CPUs are poised to to replace x86 CPUs in modern supercomputers just as commodity x86 CPUs replaced vector CPUs in early supercomputers. An analysis by the EU Mountblanc Project (PDF) (using Nvidia Tegra 2/3, Samsung Exynos 5 & Intel Core i7 CPUs) highlights the suitability and energy efficiency of ARM-based solutions. They finish off by saying, 'Current limitations [are] due to target market condition — not real technological challenges. ... A whole set of ARM server chips is coming — solving most of the limitations identified.'"

19 of 238 comments (clear)

  1. Re:IMHO - No thanks. by Stoutlimb · · Score: 5, Insightful

    No doubt your CPU would win. But when looking at power/price as well, you'd have to pit your CPU against 50 or so ARM chips in parallel. For some solutions, it may be a far better choice. One size doesn't fit all.

  2. Does it really matter? by gman003 · · Score: 4, Interesting

    Most of the actual processing power in current supercomputers comes from GPUs, not CPUs. There are exceptions (that all-SPARC Japanese one, or a few Cell-based ones), but they're just that, exceptions.

    So sure, replace the Xeons and Opterons with Cortex-A15s. Doesn't really change much.

    What might be interesting is a GPU-heavy SoC - some light CPU cores on the die of a supercomputer-class GPU. I have heard Nvidia is working on such (using Tegra CPUs and Tesla GPUs), and I would not be surprised if AMD is as well, although they'd be using one of their x86 cores for it (probably Bulldozer - damn thing was practically built for heavily-virtualized servers, not much different from supercomputers).

    1. Re:Does it really matter? by Victor+Liu · · Score: 5, Informative

      As someone who does heavy duty scientific computing, I wouldn't say that "most" of the actual process power is in GPUs. They are certainly more powerful at certain tasks, but most applications run are legacy code, and most algorithms require substantial reworking to get them to run with reasonable performance on a GPU. Simply put, GPU for supercomputing is not quite a mature technology yet. I am personally not too interested in coding for GPUs simply because the code is not portable enough yet, and by the time the technology might be mature, there might be a new wave of technology (like ARM) that could be easier to work with.

    2. Re:Does it really matter? by Junta · · Score: 5, Informative

      Of the last published top500 list, 7 out of the top 10 had no GPUs. This is a clear indication that while GPU is defintely there, claiming 'Most of the actual processing power' is overstating it a touch. It's particularly telling that there are so few as overwhelming the specific hpl benchmark is one of the key benefits of GPUs. Other benchmarks in more well rounded test suites don't treat GPUs so kindly.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    3. Re:Does it really matter? by symbolset · · Score: 5, Interesting

      These ARM cores are halfway between the extremely limited GPU cores and the extremely flexible X86 cores. They may be the "happy medium".

      --
      Help stamp out iliturcy.
    4. Re:Does it really matter? by KiloByte · · Score: 5, Informative

      Also, a lot of algorithms, perhaps even most, rely on branching, which is something GPUs suck at. And only some can be reasonably rewritten in a branchless way.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    5. Re:Does it really matter? by ThePeices · · Score: 5, Funny

      Also, a lot of algorithms, perhaps even most, rely on branching, which is something GPUs suck at. And only some can be reasonably rewritten in a branchless way.

      nonsence, I play Farcry3 on my GPU, and it renders branches just fine thank you very much.

  3. Re:IMHO - No thanks. by c0lo · · Score: 4, Funny

    The article is aimed at supercomputers, not commodity PC. You are not the target.

    While not the target, you'll be collateral damage anyway.

    --
    Questions raise, answers kill. Raise questions to stay alive.
  4. Re:IMHO - No thanks. by king+neckbeard · · Score: 5, Informative

    You aren't operating in the supercomputing market. There, what matters is the how much processing you can get for how much money. You can always buy more chips, and power usage and cooling are both signficant factors. That's why x86 became dominant in that space. It was cheaper to buy a bunch of x86 chips than to buy fewer POWER chips. In terms of computing power, a POWER7 will eat your i7 for breakfast, but they are ungodly expensive.

    --
    This is my signature. There are many like it, but this one is mine.
  5. Re:IMHO - No thanks. by Colonel+Korn · · Score: 5, Informative

    architecture is complicated. but in terms of ops per mm^2, or ops per watt, ops per $,
    cycles per useful op, the x86 architecture is a henious pox on the face of the
    earth.

    worse yet, your beloved x86 doesn't even have any source implications, its just
    a useless thing.

    In TFA's slides 10 and 11, Intel i7 chips are shown to be more efficient in terms of performance per watt than ARM chips. However, they're close to each other and Intel's prices are significantly higher.

    --
    "I zero-index my hamsters" - Willtor (147206)
  6. Re:IMHO - No thanks. by KiloByte · · Score: 4, Interesting

    Damage or a winner? I feel so bad about having a cheap, efficient, and above all, quiet box.

    I bought this 4*2GHz baby, and the only reason it's not my main desktop yet is a weird and asinine requirement for monitor resolution to be exactly 720 or 1080 (WTF?!?). I think I'll replace my old but perfectly working pair of 1280x1024 monitors (I hate 16x9!), and put the big loud clunker to the cellar. I just hate the noise so much. x86 machines with no moving parts are extremely hard to get, and have terrible performance/price. Anything that requires lots of processing power: compilation, running Windows VMs, etc, can be done remotely from the cellar just as well, while a 2GHz arm is fast enough to do client stuff, running a browser being the most demanding part.

    And what else do you need to reside directly on the machine you plop your butt at?

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  7. Questions... by storkus · · Score: 5, Interesting

    As I understand it, Intel still has the advantage in the performance per watt category for general processing and GPUs have better performance per watt IF you can optimize for that specific environment--both things which have been commented to death endlessly by people far more knowledgeable than I.

    However, to me there are at least 3 questions unanswered:

    1. ASICs (and possibly FPGAs): Bitcoin miners and DES breakers are the best known examples. Where is the dividing line between where your operations are specific enough to emply an ASIC vs not specific enough and needing a GPU (or even CPU)? Could further optimization move this line more toward the ASIC?

    2. Huge dies: This has been talked about before, but it seems that, for applications that are embarrassingly parallel, this is clearly where the next revolution will be, with hundreds of cores (at least, and of whatever kind of "core" you want). So when will this stop being vaporware?

    3. But what do we do about all the NON-parallel jobs? If you can't apply an ASIC and you can't break it down, you're still stuck at the basic wall we've been at for around a decade now: where's Moore's (performance) law here? It would seem the only hope is new algorithms: TRUE computer science!

  8. No, they won't. by Dputiger · · Score: 5, Informative

    Current ARM processors may indeed have a role to play in supercomputing, but the advantages this article implies don't exist.

    Go look at performance figures for the Cortex-A15. It's *much* faster than the Cortex-A9. It also draws far more power. There's a reason why ARM's own product literature identifies the Cortex-A15 as a smartphone chip at the high end, but suggests strategies like big.LITTLE for lowering total power consumption. Next year, ARM's Cortex-A57 will start to appear. That'll be a 64-bit chip, it'll be faster than the Cortex-A15, it'll incorporate some further power efficiency improvements, and it'll use more power at peak load.

    That doesn't mean ARM chips are bad -- it means that when it comes to semiconductors and the laws of physics, there are no magic bullets and no such thing as a free lunch.

    http://www.extremetech.com/computing/155941-supercomputing-director-bets-2000-that-we-wont-have-exascale-computing-by-2020

    I'm the author of that story, but I'm discussing a presentation given by one of the US's top supercomputing people. Pay particular attention to this graph:

    http://www.extremetech.com/wp-content/uploads/2013/05/CostPerFlop.png

    What it shows is the cost, in energy, of moving data. Keeping data local is essential to keeping power consumption down in a supercomputing environment. That means that smaller, less-efficient cores are a bad fit for environments in which data has to be synchronized across tens of thousands of cores and hundreds of nodes. Now, can you build ARM cores that have higher single-threaded efficiency? Absolutely, yes. But they use more power.

    ARM is going to go into datacenters and supercomputers, but it has no magic powers that guarantee it better outcomes.

  9. Re:IMHO - No thanks. by symbolset · · Score: 4, Interesting

    The problem you have is the software tools you use sap the power of the hardware. Windows is engineered to consume cycles to drive their need for recurrent license fees. Try a different OS that doesn't have this handicap and you'll find the full power of the equipment is available.

    --
    Help stamp out iliturcy.
  10. Xilinx Zync anybody? by Z00L00K · · Score: 4, Informative

    Has anybody else seen/considered the Xilinx Zync? It's a mix of ARM kernels and FPGA, which could be interesting in supercomputing solutions.

    For anyone willing to tweak around with it there are development boards around like the ZedBoard that is priced at US$395. Not the cheapest device around, but for anyone willing to learn more about this interesting chip it is at least not an impossible sum. Xilinx also have the Zynq®-7000 AP SoC ZC702 Evaluation Kit which is priced at US$895, which is quite a bit more expensive and not as interesting for hobbyists.

    Done right you may be able to do a lot of interesting stuff with a FPGA a lot faster than an ordinary processor can and then let the processor take care of stuff where performance isn't a critical part.

    Those chips are right now starting to find their way into vehicle ECUs, but it's still in an early phase so there aren't many mass produced cars yet with it.

    As I see it - supercomputers will have to look at every avenue to get maximum performance for the lowest possible power consumption - and avoid solutions with high power consumption in standby situations.

    --
    If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  11. One Size Doesn't Fit All -- Same in Supercomputing by gentryx · · Score: 4, Informative

    There is already one line of supercomputers built from embedded hardware: the IBM Blue Gene. Their CPUs are embedded PowerPC cores. That's the reason why those systems typically have an order of magnitude more cores than their x86-based competition.

    Now, the problem with BG is, that not all codes scale well with the number of cores. Especially when you're doing strong scaling (i.e. you fix the problem size, but throw more and more cores on the problem), then the law of Amdahl tells you that it's beneficial to have fewer/faster cores.

    Finally I consider the study to be fundamentally flawed as it compares the OEM prices of consumer-grade embedded chips with retail prices of high-end server chips. This is wrong for so many reasons... you might then throw in the 947 GFLOPS, $500 AMD Radeon 7970, which beats even the ARM SoCs by a margin of 2x (ARM: ~1 GFLOPS/$, AMD Radeon: ~2 GFLOPS/$).

    --
    Computer simulation made easy -- LibGeoDecomp
  12. That's what is so funny to me by Sycraft-fu · · Score: 4, Insightful

    Slashdot seems to have lots of ARM fanboys that look at ARM's low power processors and assume that ARM could make processors on par with Intel chips but much more efficient. They seem to think Intel does things poorly, as though they don't spend billions on R&D.

    Of course that would beg the question as to why ARM doesn't and the answer is they can't. The more features you blot on to a chip, the higher the clock speed, and so on, the more power it needs. So you want 64-bit? More power. Bigger memory controller? More power. Heavy hitting vector unit? More power. And so on.

    There's no magic ju ju in ARM designs. They are low power designs, in both sense of the word. Now that's wonderful, we need that for cellphones. You can't be slogging around with a 100 watt chip in a phone or the like. However don't mistake that for meaning that they can keep that low consumption and offer performance equal to the 100 watt chip.

  13. Re:IMHO - No thanks. by aztracker1 · · Score: 4, Insightful

    Exactly, then again, there are plenty of non-cpu intensive loads.. part of the popularity and growth of NodeJS is that a lot of jobs are IO bound, and even a lot of web services/sites are spending most of their time waiting on files, or network resources/services... 10 arm CPU's handling 10K simultaneous requests, is as good as 1 uber-cpu handling 10K simultaneous requests... for that matter, there's been a lot of work done in MessageQueue routing, and distributed databases... ARM is a pretty good fit for an environment designed to scale horizontally. Some of the first things I wanted to try on my Raspberry Pi were MongoDB and NodeJS, with the thought that a couple dozen of them might work better with more resilience than a few larger systems...

    For the record, I think addressing a bit more memory, and larger/faster storage channels are what's holding back some of these systems.. which aren't a problem at super-computer scale.. but for someone wanting to put together a small cluster, it gets irritating.

    --
    Michael J. Ryan - tracker1.info
  14. Re:Power Efficiency - MIPS vs ARM by julesh · · Score: 4, Insightful

    I may be wrong here, but I get the impression that the MIPS architecture is much more power efficient than that of the ARM architecture

    If they are going to talk about building up a big iron using CPUs which are of high power efficiency, I reckon the MIPS cpu might be more suitable for this task than one from the ARM camp

    I don't think it is. Best figures (albeit somewhat out-of-date) I can find for a MIPS-based system is 2GFLOPS/W for a complete 6-core node including memory. ARM Cortex A15 power consumption is a little hard to track down, although it's suggested that a 4-core 1.8GHz configuration (eg Samsung Exynos 5) could run at full speed on 8W (if the power manager let it; the Exynos 5 throttles down when it consumes more than 4W). Performance per GHz/core is about 4GFLOPS, so this system should be able to pull in about 28.8GFLOPS (or twice that if using ARM's "NEON" SIMD system to full advantage). Add in ~2W for 1GB DDR3 SDRAM, and that's 2.9GFLOPS/W. Assuming that the MIPS system I found is not the best available (as the data was from 2009 it certainly seems likely better is available now), the two appear to be roughly comparable.