Inside Tsubame, Japan's GPU-Based Supercomputer

← Back to Stories (view on slashdot.org)

Inside Tsubame, Japan's GPU-Based Supercomputer

Posted by timothy on Thursday December 11, 2008 @12:27PM from the please-don't-christen-the-supercomputer dept.

Startled Hippo writes "Japan's Tsubame supercomputer was ranked 29th-fastest in the world in the latest Top 500 ranking with a speed of 77.48T Flops (floating point operations per second) on the industry-standard Linpack benchmark. Why is it so special? It uses NVIDIA GPUs. Tsubame includes hundreds of graphics processors of the same type used in consumer PCs, working alongside CPUs in a mixed environment that some say is a model for future supercomputers serving disciplines like material chemistry." Unlike the GPU-based Tesla, Tsubame definitely won't be mistaken for a personal computer.

15 of 75 comments (clear)

Min score:

Reason:

Sort:

Wow! [Obligatory] by cashman73 · 2008-12-11 12:32 · Score: 4, Funny

Imagine a beowulf cluster of one of these could do! Oh, wait! ;-)
Clever name by subStance · 2008-12-11 12:48 · Score: 4, Funny

Ironic name: tsubame means sparrow in japanese, and also has the slang usage of toy-boy (as in a cougar's toy-boy).
Not sure what to read into that ...

--
Servlet v2.4 container in a single 161KB jar file ? Try Winstone
1. Re:Clever name by Anonymous Coward · 2008-12-11 13:56 · Score: 2, Informative
  
  Tsubame is actually 'swallow', not 'sparrow', which is suzume.
2. Re:Clever name by TeknoHog · 2008-12-11 19:44 · Score: 2, Funny
  
  Tsubame is actually 'swallow',
  Is that an African, a European, or an Asian swallow?
  
  --
  Escher was the first MC and Giger invented the HR department.
What is a GPU? by hurfy · 2008-12-11 13:03 · Score: 2, Interesting

When it has no graphics out? It is still a GRAPHICS Processing Unit when it doesn't calculate any graphics and doesn't display any graphics. HUH? ;)
They have a whole lot of these boosting a whole lot of quad-cores.
Re:Hold the hyperbole by timeOday · 2008-12-11 13:14 · Score: 3, Insightful

No mention of how/in what you'd program this to actually put the GPUs to good use.
That's why the supercomputer rankings are based on reasonably complex benchmarks instead of synthetic "cores * flops/core" types of numbers. Scoring well on the benchmark is supposed to be solid evidence that the computer can in fact do something useful. My question though is whether the GPUs contributed to the benchmark score, or were just along for the ride.
Re:Ofcourse by dgatwood · 2008-12-11 13:27 · Score: 4, Informative

Indeed, that's the whole idea behind the recently ratified OpenCL specification. Design a C-like language that provides a standard abstraction layer for the ability to perform complex computations on a CPU, GPU, or conceivably on any number of other devices lying around (e.g. idle I/O Processors, the DSP core in your WinModem, your printer's raster engine...).

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Hold the hyperbole - Read again by raftpeople · 2008-12-11 13:49 · Score: 5, Informative

On reading the article, the box has 30 thousand cores, of much the vast majority are AMD Opterons in Sun boxes. No mention of how/in what you'd program this to actually put the GPUs to good use
You may want to read the article again, if not here's a recap:
655 Sun Boxes each with 16 AMD cores=10,480 CPU cores
680 Tesla Cards each with 240 processors=163,2000 GPU processors

As for how to use the GPU's, I use my GTX280 (almost same thing as Tesla) to crunch through lots of numeric calculations in parallel. I'm sure these guys are doing the same thing as that is the strength of the GPU. NVIDIA has made it easier to access the processing power of the GPU with CUDA. You create a program in C that gets loaded on the GPU and when you launch it you can tell it how many copies to run at one time, each one typically operates on a different portion of the data. Because you can launch more threads than there are processors, the GPU can be reading data in from global vid mem while other threads are performing calculations.
The missing numbers by Anonymous Coward · 2008-12-11 14:16 · Score: 3, Informative

just to get a perspective, the GPUs provide about 10 out of 77 TFLOPs benchmarked in LINPACK HPC article
Re:Could do it for cheaper by Jeff+DeMaagd · 2008-12-11 14:47 · Score: 3, Informative

ATI's latest cards give more punch for the cost apiece. and they are designed specifically for being clustered/linked/xfired and whatnot.
I thought the nV Teslas were designed for HPC.
Performance going up, cost going down happens so quickly something like that can easily happen between the time it's ordered and the time it's installed.
Supercomputer or many not-so-super computers? by marciot · 2008-12-11 14:47 · Score: 4, Interesting

What makes a supercomputer *a* supercomputer, as opposed to a network of not-necessarily-super computers which all happen to be in the same building and connected to the same high-speed network? By the way this is described, it certainly seems to be a network of many computers working together, rather than one single almighty computer.
1. Re:Supercomputer or many not-so-super computers? by ruiner13 · 2008-12-11 14:56 · Score: 2
  
  It's just super, thanks for asking!
  
  --
  today is spelling optional day.
2. Re:Supercomputer or many not-so-super computers? by dlapine · 2008-12-12 04:20 · Score: 2, Informative
  
  Wikipedia claims that a supercomputer "is a computer at the forefront current processing capability" http://en.wikipedia.org/wiki/Supercomputer/. The top500 list implies that a supercomputer is a system that can run Linpack really fast, while noting that the system must also be able to run other applications. http://www.top500.org/project/introduction
  Given that NCSA has run many supercomputers over the years, and that I've personally run three while working there, I'd say that a good rule of thumb is that a supercomputer is a system designed to achieve high amounts of calculation throughput (as opposed to instant response) and that the system is at least 100x as powerful a high-end PC of that time frame. In fact, you could simplfy the rule down to- a system designed as a single unit to achieve high computing performance.
  In order to accomplish all these things, supercomputers tend to have 2 things that "normal" network of PC's doesn't- a high speed, low latency network or interconnect, (and possibly several networks, each serving a different purpose) and a high speed, shared filesystem. Also, supercomputer tends to be designed and installed as a single unit, whereas a network of PC's happens over time.
  Supercomputers tend to fall into one of 2 categories- a large collection of server class machines(cluster) or a small set of mainframe style systems(SMP). If you have the cash, you buy a large set of mainframe style systems, but who has the cash? Folks tend to purchase clusters as they tend to be less expensive, but you'd have determine if your application can work correctly on a large number of systems. Not all computing tasks can.
  Tsubame, the system described above, is basically a cluster of inexpensive nodes with a high speed network. Applications on the cluster run on many of the individual nodes at the same time, and use the high speed network to pass messages to each other during the program, so that the application appears to be working on a single system. Tsubame is variant of a supercomputer cluster, where each inexpensive node is beefed up with co-processors and accelerators to increase the overall performance. Harder to program correctly, but potentially more powerful and still not as expensive as the large set of mainframes. Hope that helps.
  
  --
  The Internet has no garbage collection
Re:Could do it for cheaper by Molochi · 2008-12-11 16:18 · Score: 2, Insightful

They could do it cheaper with anything at the current price. However, this wasn't just slopped together last month with the latest hardware off newegg.
No doubt, there's a SC being built up right now around all the latest AMD parts. By the time it gets benchmarked, we'll be able to complain that something else is a better deal.

--
"The Adobe Updater must update itself before it can check for updates. Would you like to update the Adobe Updater now?"
Re:Hold the hyperbole by lysergic.acid · 2008-12-11 16:47 · Score: 3, Insightful

how would data parallelism negatively affect a test that is designed to measure a system's performance in supercomputing applications--a field which is dominated by problems which involve processing extremely large data sets?
if vector processors do in fact perform poorly on LINPACK benchmarks then that would mean LINPACK performance is not a good indicator of real-world performance, but that clearly isn't the case as vector processors consistently perform quite well in LINPACK suite measurements.
vector processing began in the field of supercomputing, which during the 1980's and 1990's were essentially the exclusive realm of vector processors. it wasn't until companies, to save money, started designing & building supercomputers using commodity processors (P4s, Opterons, etc.) that general-purpose scalar CPUs began to replace specialized vector processors in high-performance computing. but now companies like Cray and IBM are starting to realize that this change was a mistake.
even in commodity computing the momentum is shifting away from general-purpose scalar CPUs towards specialized vector coprocessors like GPUs, DSPs, array processors, stream processors, etc. when you're dealing with things like scientific modeling, economic modeling, engineering calculations, etc. you need to crunch large data sets using the same operation; this is best done in parallel using SIMD. using specialized vector processors (and instruction sets) you can run these applications far more efficiently than you could using a scalar processor running at much higher clock speeds. the only downside is that you lose the advantage of using commodity hardware that's cheap because of their high volume production. but if companies like Adobe start developing their applications to employ vector/stream coprocessors, then that will boost the adoption of these vector processors in the commodity computing market, which will increase production volume and lower manufacturing costs.