10-Petaflops Supercomputer Being Built For Open Science Community

← Back to Stories (view on slashdot.org)

10-Petaflops Supercomputer Being Built For Open Science Community

Posted by Soulskill on Friday September 23, 2011 @05:42AM from the go-big-or-go-home dept.

An anonymous reader tips news that Dell, Intel, and the Texas Advanced Computing Center will be working together to build "Stampede," a supercomputer project aiming for peak performance of 10 petaflops. The National Science Foundation is providing $27.5 million in initial funding, and it's hoped that Stampede will be "a model for supporting petascale simulation-based science and data-driven science." From the announcement: "When completed, Stampede will comprise several thousand Dell 'Zeus' servers with each server having dual 8-core processors from the forthcoming Intel Xeon Processor E5 Family (formerly codenamed "Sandy Bridge-EP") and each server with 32 gigabytes of memory. ... [It also incorporates Intel 'Many Integrated Core' co-processors,] designed to process highly parallel workloads and provide the benefits of using the most popular x86 instruction set. This will greatly simplify the task of porting and optimizing applications on Stampede to utilize the performance of both the Intel Xeon processors and Intel MIC co-processors. ... Altogether, Stampede will have a peak performance of 10 petaflops, 272 terabytes of total memory, and 14 petabytes of disk storage."

8 of 55 comments (clear)

Min score:

Reason:

Sort:

Looks like a cluster by LordAzuzu · 2011-09-23 05:48 · Score: 3, Insightful

Not a supercomputer
1. Re:Looks like a cluster by fuzzyfuzzyfungus · 2011-09-23 06:10 · Score: 2
  
  Because the best available CPUs are only so fast, and logic boards only so large, both supercomputers and clusters end up being lots-and-lots-of-cards-connected-with-some-mixture-of-backplanes-and-cables at some point.
  
  There's a smooth-ish order of progression in terms of interconnect speed and latency(ie. SETI@home is a cluster; but inter-node bandwidth is tiny and latency can be in the hundreds of milliseconds, a cheapo commodity cluster using the onboard GigE ports has better bandwidth and lower latency, Myranet or infiniband better again, but more expensive, certain proprietary fabrics tighter still, if even more expensive).
  
  The sharp, dividing, line, though, is probably whether or not the system runs(or at least is capable of running, some may be carved up for sharing purposes) a single system image.
  
  In this cluster, it sounds like each 2-socket node boots up, like a standard computer, and then starts chatting over the network. In a single system image setup, all the CPUs and RAM are visible as a unified address space and collection of cores. Under the hood, there may be a lot of chatter going over cables, rather than with a single logic board; but, so far as the software is concerned, it is all one computer.
What is a Dell 'Zeus' server? by hawguy · 2011-09-23 05:54 · Score: 3, Informative

The article mentions that it's using Dell 'Zeus' servers, but the only information I can find about those servers online is that they are being used to build this cluster.
What is a Dell 'Zeus' server?
Re:Would sound more impressive... by Junta · 2011-09-23 06:07 · Score: 2

Don't bring technology concerns into a decision based on the neatest sounding name.

--
XML is like violence. If it doesn't solve the problem, use more.
Re:Obligatory by hawguy · 2011-09-23 06:28 · Score: 2

Assuming you want to keep all of your compute nodes busy all the time, EC2 is not a good value.
They say they'll have several thousand servers. I don't know what a Zeus server is, but let's assume it's a 1U, 2 socket server and that they'll have 2000 of them. That will give them 2000 * 2 * 8 = 32,000 cores of CPU.
That's equivalent to 32000 / 4 = 8000 Amazon EC2 Quadruple Extra Large instances. Spot pricing right now matches Reserved instance pricing, $0.56/hour, so for $27M, they can get $27M / 8000 / 0.56 = 6026 hours, or 251 days of equivalent compute power.
If each server (plus network + storage/backup) costs $10,000 (A dual CPU 6 Core Xeon X5675 Dell R410 costs $5K retail), you've spent $20M on hardware. You'll need 50 42U racks to house your servers. Budget $1000/month for each rack, or $50K/month on coloc fees. So in one year you're spending around $600K in coloc fees, leaving $6.4M leftover for salaries and other overhead. (you'll end up needing a few extra racks to hold storage and network gear plus miscellaneous non-compute node servers)
So, $27M on EC2 gets you around 8 months of compute time. $27M in hardware gets you a full year of compute time and next year "only" costs you $600K excluding salaries.
Amazon is only a great deal if you're small enough to not want to manage your own servers, or your demand is variable and you can avoid paying for unused computing capacity that is only there to handle peak loads.
Impressive if it were built today. by flaming-opus · 2011-09-23 07:44 · Score: 3, Informative

By 2013, 10 petaflops will be a competent, but not astonishing system. Probably top 10-ish on the top500 list.
The interesting part here will be the MIC parts, from intel, to see if they perform better than the graphics cards everyone is putting into super computers in 2011 and 2012. The thought is that the MIC (Many Integrated Cores) design of knights corner are easier to program. Part of this is because they are x86-based, though you get little performance out of them without using vector extensions. The more likely advantage is that the cores are more similar to CPU cores than what one finds on GPUs. Their ability to deal with branching code, and scalar operations is likely to be better than GPUs, though far worse than contemporary CPU cores. (The MIC cores are derived from the Pentium P54C pipeline)
In the 2013 generation, I don't think the distinction between MIC and GPU solutions will be very large. the MIC will still be a coprocessor attached to a fairly small pool of GDDR5 memory, and connected to the CPU across a fairly high-latency PCIe bus. Thus, it will face most of the same issues GPGPUs face now; I fear that this will only work on codes with huge regions of branchless parallel data, which is not many of them. I think the subsequent generation of MIC processors may be much more interesting. If they can base the MIC core off of atom, then you have a core that might be plausible as a self-hosting processor. Even better, if they can place a large pool of MIC cores on the same die as a couple of proper Xeon cores. If the CPU cores and coprocessor cores could share the memory controllers, or even the last cache level, one could reasonably work on more complex applications. I've seen some slides floating around the HPC world, which hint at intel heading in this direction, but it's hard to tell what will really happen, and when.
Re:I know you specifically looked for this by ae1294 · 2011-09-23 07:59 · Score: 2

Obligatory bitcoin comment.
...Fuck bitcoins
Yes I new Meme needs to be born....
Bitcoin? HOW DOES IT FUCKING WORK!
But by jirikivaari · 2011-09-23 09:47 · Score: 2

Can we play NetHack on it?