ARM In Supercomputers — 'Get Ready For the Change'
An anonymous reader writes "Commodity ARM CPUs are poised to to replace x86 CPUs in modern supercomputers just as commodity x86 CPUs replaced vector CPUs in early supercomputers. An analysis by the EU Mountblanc Project (PDF) (using Nvidia Tegra 2/3, Samsung Exynos 5 & Intel Core i7 CPUs) highlights the suitability and energy efficiency of ARM-based solutions. They finish off by saying, 'Current limitations [are] due to target market condition — not real technological challenges. ... A whole set of ARM server chips is coming — solving most of the limitations identified.'"
PC user, hardcore gamer and programmer here; for me, energy efficiency is a lesser priority than speed in a CPU. Make an ARM CPU compete with an Intel Core i7 2600K, and show me it's overclockable with few issues, and you got my attention.
Most of the actual processing power in current supercomputers comes from GPUs, not CPUs. There are exceptions (that all-SPARC Japanese one, or a few Cell-based ones), but they're just that, exceptions.
So sure, replace the Xeons and Opterons with Cortex-A15s. Doesn't really change much.
What might be interesting is a GPU-heavy SoC - some light CPU cores on the die of a supercomputer-class GPU. I have heard Nvidia is working on such (using Tegra CPUs and Tesla GPUs), and I would not be surprised if AMD is as well, although they'd be using one of their x86 cores for it (probably Bulldozer - damn thing was practically built for heavily-virtualized servers, not much different from supercomputers).
As I understand it, Intel still has the advantage in the performance per watt category for general processing and GPUs have better performance per watt IF you can optimize for that specific environment--both things which have been commented to death endlessly by people far more knowledgeable than I.
However, to me there are at least 3 questions unanswered:
1. ASICs (and possibly FPGAs): Bitcoin miners and DES breakers are the best known examples. Where is the dividing line between where your operations are specific enough to emply an ASIC vs not specific enough and needing a GPU (or even CPU)? Could further optimization move this line more toward the ASIC?
2. Huge dies: This has been talked about before, but it seems that, for applications that are embarrassingly parallel, this is clearly where the next revolution will be, with hundreds of cores (at least, and of whatever kind of "core" you want). So when will this stop being vaporware?
3. But what do we do about all the NON-parallel jobs? If you can't apply an ASIC and you can't break it down, you're still stuck at the basic wall we've been at for around a decade now: where's Moore's (performance) law here? It would seem the only hope is new algorithms: TRUE computer science!
I don't buy your response: http://top500.org/statistics/list/ ... click accelerator and hit submit.
87.6% of the top 500 super computers have no NVIDIA etc. coprocessing
- Michael T. Babcock (Yes, I blog)
Hopefully this means we should start seeing ARM-using motherboards in an ATX form-factor. The Pi and Beaglebone are nice, but I want something that's eassentially just like a commodity x86 motherboard except it uses ARM.
Current ARM processors may indeed have a role to play in supercomputing, but the advantages this article implies don't exist.
Go look at performance figures for the Cortex-A15. It's *much* faster than the Cortex-A9. It also draws far more power. There's a reason why ARM's own product literature identifies the Cortex-A15 as a smartphone chip at the high end, but suggests strategies like big.LITTLE for lowering total power consumption. Next year, ARM's Cortex-A57 will start to appear. That'll be a 64-bit chip, it'll be faster than the Cortex-A15, it'll incorporate some further power efficiency improvements, and it'll use more power at peak load.
That doesn't mean ARM chips are bad -- it means that when it comes to semiconductors and the laws of physics, there are no magic bullets and no such thing as a free lunch.
http://www.extremetech.com/computing/155941-supercomputing-director-bets-2000-that-we-wont-have-exascale-computing-by-2020
I'm the author of that story, but I'm discussing a presentation given by one of the US's top supercomputing people. Pay particular attention to this graph:
http://www.extremetech.com/wp-content/uploads/2013/05/CostPerFlop.png
What it shows is the cost, in energy, of moving data. Keeping data local is essential to keeping power consumption down in a supercomputing environment. That means that smaller, less-efficient cores are a bad fit for environments in which data has to be synchronized across tens of thousands of cores and hundreds of nodes. Now, can you build ARM cores that have higher single-threaded efficiency? Absolutely, yes. But they use more power.
ARM is going to go into datacenters and supercomputers, but it has no magic powers that guarantee it better outcomes.
I have long pined for a server with maybe 10 4 core ARM CPUS. Basically my server spends its time serving up web stuff from memory. Each web request needs to do a bit of thinking and then fire the data out the port. Disk IO is not an issue nor is server bandwidth. Quite simply I don't need much CPU but I need many CPUs. A big powerful intel is of less interest.
Also by breaking up the system into physically separate CPUs I suspect that an interesting memory accessing architecture could be conjured up preventing another potential choke point.
Has anybody else seen/considered the Xilinx Zync? It's a mix of ARM kernels and FPGA, which could be interesting in supercomputing solutions.
For anyone willing to tweak around with it there are development boards around like the ZedBoard that is priced at US$395. Not the cheapest device around, but for anyone willing to learn more about this interesting chip it is at least not an impossible sum. Xilinx also have the Zynq®-7000 AP SoC ZC702 Evaluation Kit which is priced at US$895, which is quite a bit more expensive and not as interesting for hobbyists.
Done right you may be able to do a lot of interesting stuff with a FPGA a lot faster than an ordinary processor can and then let the processor take care of stuff where performance isn't a critical part.
Those chips are right now starting to find their way into vehicle ECUs, but it's still in an early phase so there aren't many mass produced cars yet with it.
As I see it - supercomputers will have to look at every avenue to get maximum performance for the lowest possible power consumption - and avoid solutions with high power consumption in standby situations.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
There is already one line of supercomputers built from embedded hardware: the IBM Blue Gene. Their CPUs are embedded PowerPC cores. That's the reason why those systems typically have an order of magnitude more cores than their x86-based competition.
Now, the problem with BG is, that not all codes scale well with the number of cores. Especially when you're doing strong scaling (i.e. you fix the problem size, but throw more and more cores on the problem), then the law of Amdahl tells you that it's beneficial to have fewer/faster cores.
Finally I consider the study to be fundamentally flawed as it compares the OEM prices of consumer-grade embedded chips with retail prices of high-end server chips. This is wrong for so many reasons... you might then throw in the 947 GFLOPS, $500 AMD Radeon 7970, which beats even the ARM SoCs by a margin of 2x (ARM: ~1 GFLOPS/$, AMD Radeon: ~2 GFLOPS/$).
Computer simulation made easy -- LibGeoDecomp
I may be wrong here, but I get the impression that the MIPS architecture is much more power efficient than that of the ARM architecture
If they are going to talk about building up a big iron using CPUs which are of high power efficiency, I reckon the MIPS cpu might be more suitable for this task than one from the ARM camp
Muchas Gracias, Señor Edward Snowden !
Slashdot seems to have lots of ARM fanboys that look at ARM's low power processors and assume that ARM could make processors on par with Intel chips but much more efficient. They seem to think Intel does things poorly, as though they don't spend billions on R&D.
Of course that would beg the question as to why ARM doesn't and the answer is they can't. The more features you blot on to a chip, the higher the clock speed, and so on, the more power it needs. So you want 64-bit? More power. Bigger memory controller? More power. Heavy hitting vector unit? More power. And so on.
There's no magic ju ju in ARM designs. They are low power designs, in both sense of the word. Now that's wonderful, we need that for cellphones. You can't be slogging around with a 100 watt chip in a phone or the like. However don't mistake that for meaning that they can keep that low consumption and offer performance equal to the 100 watt chip.
...but also reliability (because supercomputers are really large and one failed node will generally crash the whole job, thereby wasting gazillions of core hours; that's one reason why SC centers buy expensive Nvidia Tesla hardware instead of the cheaper GeForce series) and IO and memory bandwidth and finally integration density. That one Intel chip can be more tightly integrated as it won't generate as much excess heat per GFLOPS (according to TFA...).
Computer simulation made easy -- LibGeoDecomp
I am a fan boy for the small ARM boards... I have built an MPI cluster out of Raspberry-Pi boards and it is not even close except as a teaching exercise where it excels.
However many site services can be dedicated to these little boards where corp IT seems to dedicate virtual machines.
Department Web Servers... with mostly static content... via NFS or a revision control system like hg.
Department and internal caching name servers... NTP servers and managed central storage for each building or closet.
The impact of the little ARM boards has kicked Intel in their lethargy-loaded-behind. Their next generation sub 25 Watt systems will take names and kick but as long as IT does not overload them with WindowZ.
IT departments will find that the management advantage of chromebox devices connected to quality screens compelling.
Users will find that flipping open the company ChromeOS laptop will put them on the same page as the big screen in the office...
It is true that this is not 100% ready for prime time for all of us but the handwriting is on the wall.
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
A gimp version of windows is not going to get the job done.
On the other hand, a Windows version of GIMP does get a lot of jobs done that don't quite need Adobe Photoshop.
But seriously, the reason Windows RT is "gimped" is because Microsoft has refused to endorse recompiling desktop applications. That's not a failing of ARM, as ARM ran RISC OS on Acorn computers, as much as a power grab by Microsoft.
Some of the Samsung Slate tablets however come with an x86...and are actually fully functional! Can you point to an ARM tablet that can do everything it can?
Some ARM tablets run Ubuntu. Other Android tablets run Debian in a chroot, with video out through an X11 server app for Android. These can't run Windows applications in Wine the way x86 applications do, but they work for any GNU/Linux application that has been recompiled for ARM.