Slashdot Mirror


HPE Announces World's Largest ARM-based Supercomputer (zdnet.com)

The race to exascale speed is getting a little more interesting with the introduction of HPE's Astra -- what will be the world's largest ARM-based supercomputer. From a report: HPE is building Astra for Sandia National Laboratories and the US Department of Energy's National Nuclear Security Administration (NNSA). The NNSA will use the supercomputer to run advanced modeling and simulation workloads for things like national security, energy, science and health care.

HPE is involved in building other ARM-based supercomputing installations, but when Astra is delivered later this year, "it will hands down be the world's largest ARM-based supercomputer ever built," Mike Vildibill, VP of Advanced Technologies Group at HPE, told ZDNet. The HPC system is comprised of 5,184 ARM-based processors -- the Thunder X2 processor, built by Cavium. Each processor has 28 cores and runs at 2 GHz. Astra will deliver over 2.3 theoretical peak petaflops of performance, which should put it well within the top 100 supercomputers ever built -- a milestone for an ARM-based machine, Vildibill said.

57 comments

  1. Deepfakes by Anonymous Coward · · Score: 0

    Will be used exclusively to produce more Deepfakes content. ARMs will get stronger, indeed.

    1. Re:Deepfakes by Anonymous Coward · · Score: 0

      Does it have a three and a half inch floppy dick drive?

    2. Re: Deepfakes by Anonymous Coward · · Score: 0

      Iâ(TM)m Soviet Russia deepfakes produce you!

  2. Awesome news! by Anonymous Coward · · Score: 0

    And it has a 5.5 inch screen! Kool beans. Candy Crunch and Bejeweled here I come!

  3. Quantity game? by Tablizer · · Score: 1, Flamebait

    Maybe I'm naive, but a typical "supercomputer" these days mostly just connects up bunches of servers (or "servlets") via a central cluster manager or cluster tree. The "size" of the super-computer is then roughly the total number of CPU's (or maybe total instructions per second for the entire shebang).

    Thus, if you want to make a "numeric" world's record, you just get ship-loads of servers and hook them up to the cluster manager tree. It's mostly a quantity pissing match roughly comparable to having the tallest building.

    1. Re:Quantity game? by Anonymous Coward · · Score: 0

      via a central cluster manager or cluster tree

      Imagine a Beowulf cluster of nested Beowulf clusters. Call it the "TurtleTron". God ordered 7 to emulate God ordering 7 more.

    2. Re:Quantity game? by Anonymous Coward · · Score: 0

      just connects up bunches of servers

      so by your definition the internet is the world's biggest (only?) supercomputer.

    3. Re: Quantity game? by Anonymous Coward · · Score: 0

      That number is meaningless.

      Scientists care a lot about shared memory performance and latency between chips on each board or backplane. Otherwise you can just spin it up on aws.

    4. Re:Quantity game? by ArchieBunker · · Score: 1

      If they were all working concurrently on the same problem, then yes.

      --
      Only the State obtains its revenue by coercion. - Murray Rothbard
    5. Re:Quantity game? by Anonymous Coward · · Score: 0

      No, you have to unify the filesystem and memory spaces, which is where most of the effort ends up.

    6. Re:Quantity game? by friedmud · · Score: 5, Informative

      You are naive. That's how you make a really crappy supercomputer.

      This machine will have more than 100,000 cores. At that scale there are many things that must be carefully thought out. Even just _launching_ a job at 100k procs presents challenges (enough so that people who do it well put out press releases about it: http://mvapich.cse.ohio-state.... ). Beyond choosing the processor (obvious) here are some of the things that must be thought about / balanced:

      1. Power - for machines this large you often have to make special deals with local utility companies to power it efficiently.
      2. Cooling - The heat load will be immense, deciding how to cool it is incredibly important
      3. Interconnect - There are many options here (although fewer than in the past). Choosing e.g. Infiniband vs Ethernet, etc. comes with different tradeoffs and can depend on what your average application will be doing (many short messages vs large messages, etc.)
      4. Switching - How many switches are needed? What topology will you use (fat-tree, hypercube, etc.). It depends somewhat on how much you want to spend on switches and somewhat on what your typical application workload looks like.
      5. RAM - RAM is currently incredibly expensive (thanks to cell-phones using so much of it!). How much RAM, what type, how fast can greatly tip the scales in price / performance
      6. OS - Most of these machines these days run Linux - but there are many different flavors. Things get optimized all the way down to exactly which Kernel version to use - and everything is hand-tuned
      7. Job Scheduler - Several options here from PBS to Slurm and proprietary vendor specific options. How good your job scheduler is can have a HUGE impact in the usability of the machine.
      8. Filesystem - Most of these machines have at least two types of filesystems: "home" and "scratch"... where "home" can be something reliable - maybe even using NFS and "scratch" is typically some highspeed filesystem (Lustre, Panasas, etc.). Choosing the balance between the two is critical. Note that 100,000 processes reading/writing simultaneously can take even carefully crafted filesystems to their knees.
      9. Local disk - for a long time it was in voguge to run a "diskless" system - but now "disks" are making a come back (in the way of NV-RAM). Depending on what your applications look like this can provide huge speedups.

      (I'm sure I missed something - but these are the big ones)

      Anyway: It's not simple. Purchasing for these machines typically takes at least a year just in the phase where you're defining the requirements and then another 6 months or so to put out bids and go through the selection process.

      In case you're wondering - I do work in the national lab system, I use these machines daily and am part of procurement decisions for them...

    7. Re:Quantity game? by Graymalkin · · Score: 2

      The biggest difference between a simple compute cluster and a supercomputer is the speed of the interconnect. A compute cluster might have individually fast nodes, potentially decked out with RAM, but it's not going to be able to access the contents of any other node's memory effectively. So a big problem needs to be partitioned into slices that fit on a node.

      Supercomputers have fast enough interconnects that multiple nodes can act as a single machine image. Nodes can read and write to the shared memory so they can access the global state of the computation. So you can model a trillion particles in a system rather than millions or billions.

      --
      I'm a loner Dottie, a Rebel.
    8. Re: Quantity game? by Anonymous Coward · · Score: 0

      No idea what you're talking about. Things like GFlops/watt and GFlops/sq ft are important metrics today, not just GFLOPS alone which basically leaves conventional architecture clusters in the sidelines

    9. Re:Quantity game? by friedmud · · Score: 2

      These machines are still "distributed memory" supercomputers. It's rare to see a true "shared memory" cluster in HPC these days.

      Infiniband works off of a RDMA process (Remote Direct Memory Access) - but you wouldn't consider it to be "shared" memory (and programmers don't typically interact with the RDMA calls directly - most often still using MPI... but MPI then uses RDMA to achieve the transfer).

      That said: you are correct that interconnect is one of the things that makes a supercomputer "super". The price of the interconnect can be a significan percentage of the purchase price of the machine. The number of network cards, number of switches (and hence topology) and length of cabling all makes a difference in the price... and the performance.

    10. Re:Quantity game? by DontBeAMoran · · Score: 0

      7 x 7 = 42

      I knew something was wrong with the Universe.

      --
      #DeleteFacebook
    11. Re: Quantity game? by Anonymous Coward · · Score: 0

      Exactly right. It all comes down to budget. You have a certain amount to build the machine then an annual operating budget. You try to maximize the amount of flops/dollar. That’s why Summit has 6 GPUs pernode but only about 32 cores pernode.

      But these system are hitting an interesting bottle neck though. They speed you can write data back to the hard drive hasn’t kept up. You can run these massive simulations on them, but you can’t save the data. So you have to employ institu data analysis to get anything useful out.

    12. Re:Quantity game? by Anonymous Coward · · Score: 0

      If they were all working concurrently on the same problem, then yes.

      they are all servicing humans

    13. Re: Quantity game? by Anonymous Coward · · Score: 0

      While you're technicaly correct, you might have a wrong impression. What distinguishes the modern supercomputers from clusters is the interconnects used. They are usually with very low latency and often, there are more than one -- each specialized in particular task.

    14. Re:Quantity game? by Anonymous Coward · · Score: 0

      while you maybe a user of these systems, and I do appreciate that, I somehow think the teams that keep building these for the NSF, Dept of Energy, etc. kind of have their shit together on building out these systems.

      Even a simple server rack migration involves more details than many people think about, especially the trite "how hard can it be to unplug it, move it and plug it back in?" idiots.

    15. Re:Quantity game? by Tablizer · · Score: 1

      That's how you make a really crappy supercomputer.

      That's my point: setting a record, and being useful/good are not necessarily the same thing.

      Maybe the cheap-and-easy version can perform a very narrow set of calculation types faster than anything and set a record doing those narrow things.

      It's roughly comparable to a USA "muscle car" compared to European sports cars. On a straight road, the relatively cheap muscle car may out-perform the expensive European sports cars in certain categories. But throw in some curves and the muscle car slumps. You can set numeric or very specific records without being "generally" or "widely" good. (US happens to have more straight roads than Europe, I'd note.)

      I am not claiming that's what actually happened, I just floated the possibility that there is some "inflation" going on for bragging rights. Humans and egotistical governments do that sometimes.

    16. Re:Quantity game? by Tablizer · · Score: 1

      On review, I worded my original poorly, implying that such shenanigans were the rule instead of the exception. My apologies.

    17. Re: Quantity game? by Anonymous Coward · · Score: 0

      You misunderstood. They are not servicing humans. It is a cookbook!!!!!

    18. Re:Quantity game? by mikael · · Score: 1

      For a modern supercomputer, the connections between the nodes are just as or even more important than the GPU cores and CPU cores that are on each node, and get a name; the interconnect fabric or fabric computing. These systems are rack mounted with each node on a single motherboard. Racks can be added and removed according to funding. Each node needs to transfer data to any other node within a few nanoseconds as well as load startup data and save checkpoints at fixed intervals. Some computing problems depend on different topologies of data to solve problems like 3D grids for weather simulation.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    19. Re:Quantity game? by Tablizer · · Score: 2

      It does beg the question: what exactly is a "super-computer"? The boundaries seem fuzzy to me, based on being tuned for an intent rather than physical characteristic.

    20. Re:Quantity game? by Anonymous Coward · · Score: 0

      When they talk about top500 lists there are specific benchmarks (i.e. linpack, HPCG) to get the benchmarks. You can (and many people do) argue about the resemblance of benchmarks workloads to real life problems, but it is a measurement. It also allows machines to game the system, for systems like TaihuLight, it can get high Tflop/s in linpack but doesn't do nearly as well with HPCG type of data accesses.

      In my mind, a true supercomputer is more of a balanced architecture with processing, memory, interconnect,and i/o all scaled appropriately for a family of problems. But I am in the minority, because the majority of vendors are just clusters with COTS components these days. Which is probably fine because the vast majority of DoE and HPCMP systems are being carved up to run thousands of small jobs so are more about centralizing compute so it is more elastic than pure balls to the wall performance.

    21. Re:Quantity game? by pnutjam · · Score: 1

      To be fair, High performance computer clusters can work on multiple problems simultaneously. I think you mean that all the computers on the internet need a centralized queue management system to be considered a "supercomputer".

    22. Re:Quantity game? by pnutjam · · Score: 1

      Managed some clusters that did engine modelling for Rolls-Royce. Your spot on. When you run dozens (or hundreds) of identical machines, there is alot that goes into it. It's not simple by any means. You could have warmed soup in the back of our clusters, especially when under load.
      File system can make a huge difference, as can job scheduler.

    23. Re:Quantity game? by Tablizer · · Score: 1

      It seems they are geared toward physics simulations. In that sense, a "super-computer" is something that runs typical/common physics simulations well. They may be poor fits for other applications, such as running Google's index & search engine.

      I wonder, is hardware optimized for physics simulations also well-suited for cryptography, such as code cracking?

    24. Re: Quantity game? by Anonymous Coward · · Score: 0

      One more filesystem:

      - These machines all have a common shared filesystem adapted to bulk processing. Getting a filesystem able to support 100K cores attacking it simultaneously is not like designing your little NAS at home.

  4. Who is this âoecreamierâ? by Anonymous Coward · · Score: 0

    Why is he so popular here? I must know more about him!

    1. Re: Who is this âoecreamierâ? by Anonymous Coward · · Score: 0

      Are you talking about cremier?

    2. Re: Who is this âoecreamierâ? by Anonymous Coward · · Score: 0

      shut yo mouth

  5. But maybe not for long... by Isvara · · Score: 1

    I wonder how long until we see a million ARM SpiNNaker? http://apt.cs.manchester.ac.uk...
     

  6. Cost and power? by larryjoe · · Score: 1

    This is an interesting development in the use of ARM processors in large computing systems. However, how much progress this represents depends on the dollars and watts needs to produce results. News articles frequently mention the 2.3 petaflops number, but the procurement cost of the system and the power needed to achieve the peak petaflops number are hard to find. If this ARM system doesn't present a compelling dollars or watts story, what is the advantage of this system over competing technologies?

    1. Re:Cost and power? by friedmud · · Score: 2

      Well - ease of programming for one thing.

      With the death of Intel Phi... the HPC community really only has GPUs to offer good flops/watt. The problem with that? Not all workloads map well to GPUs and you often can't rewrite millions of dollars of software that doesn't use GPUs.

      ARM offers another alternative: it can run anything an x86 processor can at better flops/watt.

      The rise of ARM in HPC is _definitely_ an interesting development!

    2. Re: Cost and power? by Anonymous Coward · · Score: 0

      And ironically the ARM Thunderx are also better GPU platforms that Intel CPU currently because of a much better memory bandwidth

  7. Exascale speed by Anonymous Coward · · Score: 0

    exascale speed

    0.0023 exaflops

    yes, quite the race, mr. editor

  8. Beowulf cluster by Leninix · · Score: 1

    I would run this super computer in a beowulf cluster

  9. Only an RPeak of 2.3 Petaflops? by CajunArson · · Score: 1

    For a 5,184 socket system a "Peak" performance of 2.3 Petaflops isn't that revolutionary.

    I'm assuming that when they say "peak" they mean a LinPack "Rpeak" value which is usually (with a few exceptions) *higher* than the "Rmax" value that's actually used to order the systems by performance. There is no contra-indication in the story that these values are Rmax and in fact the story literally says "theoretical peak petaflops" definitely makes me think Rpeak?

    You can see the soon to be outdated list from last November here: https://www.top500.org/list/20...

    For perspective, if you go way down that list to #82 you'll see the rather pedestrian Riken Energy Hokusai BigWaterfall system from last year that hits a noticeably higher 2.58 Petaflops peak and only needs 1680 sockets with rather pedestrian 20 core CPUs to do it.

    Scale that system down by 10% as a rough estimate to 1512 sockets and you have a 3.4 to 1 socket advantage and 4.8 to 1 socket advantage for a rather generic commercial system from last year that isn't even using a single supercompute accelerator to get its performance.

    --
    AntiFA: An abbreviation for Anti First Amendment.
    1. Re:Only an RPeak of 2.3 Petaflops? by friedmud · · Score: 2

      The flops/socket is still better than BlueGene procs do - and I suspect the flops/watt will be a LOT better than the Xeon system you pointed out.

      An exascale computer can't simply use 10M Xeons... you would need to build a small nuclear reactor next to it to power it. And while GPUs are useful for generating flops... not all workloads map well to them. These cores are general purpose: they can run anything a Xeon can run... but should use a lot less power.

    2. Re:Only an RPeak of 2.3 Petaflops? by CajunArson · · Score: 1

      " I suspect the flops/watt will be a LOT better than the Xeon system you pointed out."

      Not according to real-world tests of these chips: https://www.servethehome.com/c...

      There's a long-held assumption that anything with the word "ARM" on it must be energy efficient because reasons. Well this isn't a smartphone SoC.

      --
      AntiFA: An abbreviation for Anti First Amendment.
  10. Holy shit, CPUs! by Graymalkin · · Score: 1

    Besides the interesting point of this system using ARM over x86-64, it looks like it's all CPU powered. The past few TOP500 rankings have been giant GPU clusters with fast interconnects. The CPUs have provided little of their actual number crunching.

    It's not like GPU heavy super computers are slacking or anything, I just think it's cool seeing a machine get high performance without them. I'm no expert but it seems an all CPU design would be easier to write code for since the problem set doesn't need to be sliced to fit into a GPU's often more limited RAM than the host.

    --
    I'm a loner Dottie, a Rebel.
    1. Re:Holy shit, CPUs! by friedmud · · Score: 1

      Yep! CPUs are definitely easier to program and sometimes GPUs are exactly wrong for certain workloads.

      BTW: The current #1 (which will surely be supplanted in the soon to be refreshed Top500) is an all "CPU" machine (somewhat close to what Intel Phi was): https://www.top500.org/system/...

      10M actual "CPU" cores. But - they are clocked lower and of quite a bit different architecture from your normal Xeon...

      ARM's rise is definitely interesting because it should give us another option for good flops/watt while remaining simple to program.

    2. Re:Holy shit, CPUs! by DontBeAMoran · · Score: 2

      The cool part for me is learning there's 28-cores ARM processors out there, which gives me hope that Apple's delays with the MacBook Air and Mac mini means they're close to releasing their first-ever ARM-powered Macs running macOS.

      --
      #DeleteFacebook
    3. Re:Holy shit, CPUs! by friedmud · · Score: 0

      Sounds good - except AMD Epyc is 32 cores / socket today... and 64 cores / socket next year.

      Personally: I would love a 128 core, dual-socket AMD Epyc based Mac Pro next year...

      That said: I do still think that an ARM based Mac Pro would be fun... but hopefully we'll see chips with more than 28 cores in them...

    4. Re:Holy shit, CPUs! by Anonymous Coward · · Score: 0

      One of the server ARM chips I've seen is supposed 32-core and quad-threaded to provide 128 threads. I don't know what workloads will favor that, though.

    5. Re:Holy shit, CPUs! by DontBeAMoran · · Score: 1

      Sounds good, but have you seen the prices on those AMD Epyc CPUs? And the power requirements?

      I'm think about an Apple ARM processor, based on the A12 or whatever found in the iPad Pro or iPhone X, but with eight or sixteen times as many cores, running macOS and emulating x86 for legacy applications.

      --
      #DeleteFacebook
  11. Quantity X by mbkennel · · Score: 2

    There's lots of naivete in the "connect up bunches" part.

    The supercomputer has far higher interconnect bandwidth and better latency than typically networked commercial servers.

    There needs to be high-performance (meaning assembly level drivers in cases) support for the API's used by the heavily multiprocessed workloads. Think about massive partial differential equation solvers with one gridpoint talking to others and updating at every timestep.

    Conventional networked servers and their bad latency: http://www.scs.stanford.edu/~rumble/papers/latency_hotos11.pdf

  12. But does it run Crysis? by Anonymous Coward · · Score: 0

    Or anything Windows?

      NO!

      Ergo Gargage!

  13. What part of 2.3 petaflops by Anonymous Coward · · Score: 0

    is 'exascale'?

  14. comprise instead of compose != smarter by Anonymous Coward · · Score: 0

    The United States is composed of 50 states.
    The United States comprises 50 states.
    It's not that hard...

  15. Now that Chinese government pwns ARM... by dicobalt · · Score: 2

    Game over for ARM designs from China. Super computers don't need need kind of risk. Adjust your investment portfolio as needed and lol at the crash that's due.

  16. But they can't run a website? by Anonymous Coward · · Score: 0

    If you go to https://news.hpe.com/ then scroll down and click "Newsroom" in the menu at the bottom of the page, you get "Sorry, Page Not Found"

  17. Worthless in two years by Anonymous Coward · · Score: 0

    Well now won't you feel silly when Google discontinues software updates for your Android based supercomputer.

  18. Its worse than that by Chrisq · · Score: 1

    The muzzies have got a hand in it

  19. In other news... by thomn8r · · Score: 1

    HPe has announced they're laying off their entire US-based ARM engineering team

  20. HPE by JThundley · · Score: 1

    There's no way High Pitch Eric is working on any type of CPU.