China Bumps US Out of First Place For Fastest Supercomptuer
An anonymous reader writes "China's Tianhe-2 is the world's fastest supercomputer, according to the latest semiannual Top 500 list of the 500 most powerful computer systems in the world. Developed by China's National University of Defense Technology, the system appeared two years ahead of schedule and will be deployed at the National Supercomputer Center in Guangzho, China, before the end of the year."
So we know it runs Microsoft Windows. U.S.A.! U.S.A.!
Get free satoshi (Bitcoin) and Dogecoins
Quickly before they sap and impurify all of our precious bodily fluids!
In all, Tianhe-2, which translates as Milky Way-2, operates at 33.86 petaflop per second
First of all, it's PetaFLOPS. It's not a plural, so there is no PetaFLOP. FLOPS = FLoating-point Operations Per Second, so saying "PetaFLOP per second" is saying "Peta-FLoating-point Operations Per per second"
Your information is out of date. Most supercomputers in the last decade have been distributed memory machines, so 'distributed computing' is what this is already. Also, as someone that's using a machine somewhat further down the list (in the 30s), if you have a big supercomputer that you feel is a waste, can you give me an account? Because my job (in fluid dynamics simulations) is basically dependent on their existence, and I've got applications for the biggest machine I can get my hands on.
I'll bite. You seem to think that distributed computing, however you are defining that, is a better solution. I am going to assume your primary objection then is using infiniband (or some other low latency interconnect such as Numalink or Gemini). What then, would you propose to do with the class of problems that are rely on extremely low latency transmission of data between nodes?
That is, hands down, the best thing I've seen all day.
It's interesting to browse this website:
http://www.top500.org/
And look at the Statistics section, such as Operating System Family
http://www.top500.org/statistics/list/
Operating system Familyâf Countâf System Share (%)âf Rmax (GFlops)âf Rpeak (GFlops)âf Coresâf
Linux 476 95.2 217,913,963 318,748,391 18,700,112
Unix 16 3.2 3,949,373 4,923,380 181,120
Mixed 4 0.8 1,184,521 1,420,492 417,792
Windows 3 0.6 465,600 628,129 46,092
BSD Based 1 0.2 122,400 131,072 1,280
People don't build supercomputers for no reason, especially when HPC eats up a large part of their budget.
The main application of supercomputers is numerically solving partial differential equations on large meshes. If you try that with a distributed setup, the latency will kill you: the processors have to talk constantly to exchange information across the domain.
As someone pointed out, modern supercomputers are like distributed computing, often with commodity processors. They look like (and are) giant racks of processors. But they have very fast, low-latency interconnects.
While I completely agree that being in 1st place doesn't mean much, taking a look at the entire top 500 does give a good measure of which countries are spending the most on R&D. I do think it is a little shameful that a country with half of our GDP has the fastest supercomputer, it is still commendable that the USA has about half of the top 500 supercomputers with only 20% of the world's GDP.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
That's what the author gets for mistyping at ten writing flops per minute.
Ezekiel 23:20
China Bumps US Out of First Place For Fastest Supercomptuer. Posted samzenpus on Monday June 17, 2013 @02:42PM 20 minutes after: Book Review: The Chinese Information War Posted by samzenpus on Monday June 17, 2013 @02:22PM You do the math....
The second place computer was from Redmond, Washington and almost clinched the title but it had to be connected to the internet at least once every 24 hours, had a camera connected 24/7, and was not able to share its programs with other computers.
Logically, supercomputers are inherently distributed in a torus configuration.
Why a torus? Why not a hypercube or a fat tree?
Ezekiel 23:20
"China Bumps US Out of First Place For Fastest Supercomptuer"
Fastest supercomputer, that 1) Runs Linpack and 2) is publicly-acknowledged. There are plenty of similar supercomputers that don't meet one or both of those criteria, and are therefore omitted. The Top500 is FAR from a comprehensive list of supercomputers, but twice a year we see a flurry of stories presuming that it is.
That's almost enough to run Vista
You seem to think
Aha, I've identified the error in your logic!
rest of their lives breaking codes for the national spy agencies. Several of the top computers, like Kraken, Jaguar, and Titan, were/are NSA cryptography machines.
The NSA has their own computers, why would they need to use the rather publicly known ones, and compete with other users for time? Do you assume those computers only do one piece of science because you only read about it in the news/PR, or did you actually bother to look at the research papers and groups using these computers on a daily basis? I know people on research groups that use those computers. What they have to sometimes compete with is not the NSA, but nuclear stewardship programs. Other than that, it is other sciences groups getting time and/or slices of the machine.
You get stuck in a five hour meeting and see if you can visualize anything other than a doughnut afterward.
If you use a hyper-cube, then the processors on the outside edges have no one to talk to. For a single dimension example, imagine a series of processors where every processor in a line has two communication links, one to talk to its neighbour on the left, and one to talk to its neighbour on the right. This is great for all the processors in the middle of the arrangement. However, in a one-dimensional straight-line arrangement, the processors on the end are either missing a left (or a right) neighbour. The solution to this problem is to connect the processors on the ends to each other, making the line a circle or ring.
A one-dimensional hypercube is a line. In supercomputing, it is often desirable to avoid any topology where the there is a flat (non-connected surface) on the side of the cube. Connecting the opposite edges of the cube to each other results in the torus topology in higher dimensions, and the ring topology in 1-D. For a picture of this effect, see the torus interconnect article on wikipedia.
While it is theoretically possible preferable to have really high-order interconnects, in practice wiring considerations limit the maximum number of interconnects. As such, most practical torus architectures are limited in the number of neighbours they can support.
FYI: The tree architecture is avoided in supercomputing for a different reason. Typically, each node has the fastest interconnect that can be provided, as interconnect speed affects system speed for many algorithms. Imagine if each leaf at the bottom of the tree needs 1X bandwidth. Then the parent node one-element up needs 2X bandwidth. The next parent node up requires 4X bandwidth, and so on. With tens of thousands of nodes in the supercomputer, it quickly becomes impossible to make fabricate interconnects fast enough for the parent nodes of the tree.
A practical application of the tree problem occurs on small Ethernet clusters. It is easy to make a 16-node 10Gb Ethernet cluster, because standard switches are readily available. As the system approaches hundreds of nodes, it becomes difficult to find fast enough switches. Even if the data communication speed to each node is reduced to 1Gb, for sufficiently large numbers of nodes, the backplane switches will be overwhelmed.
Here is a list of the top 5 supercomputers run by the NSA (partially redacted):
1- XXXXX_XXXXXXX_XXXXXX_XXXX
2- XXXXXXXXXXXXXinator
3- XXXXXXXXOfTheXXXXX
4- PinkiePie15
5- XXX_XXXXXX_XXXXXXX
Is that better?
Wait. I thought PETAFlops was a measure of how many times PETA have launched an idiotic campaign. As in, "I read that PETA is campaigning that we should call fishes 'sea kittens'. That's 7 PETAFlops so far this year".
As a computational physicist:
"flop" is sometimes used to mean "floating point operation", when you're talking about the compute cost of an algorithm. For instance:
"The Wilson dslash operation requires 1,320 flops per site" or "The comm/compute balance of this operation is 3.2 bytes per flop".
So saying "ten flops per second" is fine -- "flops" is the plural of "flop".
Yes, "flops" is also acronymized as "... per second", and while that's the most common use it's not exclusive.
As a computer scientist:
We rarely refer to the cost of an algorithm in terms of flops, since it is bound to change with 1) software implementation details, 2) hardware implementation details, and 3) input data dependencies (for algorithms with dynamical properties). Instead, we describe algorithms in "Big O" notation, which is a convention for describing the theoretical worst-case performance of an algorithm in terms of n, the size of the input. Constant factors are ignored. This theoretical performance figure allows apples-to-apples comparisons between algorithms. Of course, in practice, constant factors need to be considered for many specific scenarios.
"flops" are more commonly used when talking about machine performance, and that's why they're expressed as a rate. You care about the rate of the machine, since that often directly translates into performance. Computer architects also measure integer operations per second, which is in many ways more important for general-purpose computing. Flops are really only of interest nowadays for people doing scientific computing now that graphics-related floating point things have been offloaded to GPUs.
If you want to be pedantic, computers are, of course, hardware implementations of a Turing machine. But it's silly to talk about them using Big O notation, since the "algorithm" for (sequential) machines is mostly the same regardless of what machine you're talking about. The constant factors here are the most important thing, since these things correspond to gate delay, propagation delay, clock speed, DRAM speed, etc.
Not the poster upthread, but as someone else who runs fluids codes on big machines, I will chime in:
A lot of the guys on the big NICS machines aren't using ANSYS. They're using their own research codes that are tailored for parallel performance and/or to solve specific and difficult problems that commercial codes don't do well, like fluid-structure interaction. I know there are guys that depend on licensing somehow or another and this is artificially limiting. But I never really understood it. If all you want is a basic, parallel fluids solver, there are some open-source options. Probably won't scale well, but it sure beats spending half your lab budget to get only 8 processors.
Even if you have your own in-house solver, you will of course run into problems with latency as you scale up. I usually run on around 100-200 processors, depending on the problem. I would love use more, but the communication costs start to take over. Some guys can run on 10-100,000 processors. Not sure what they are doing, but I am guess whatever they are computing requires very little communication between nodes, or has been optimized to an extreme degree. Hard to imagine those guys are running a normal fluids solver with an unstructured grid. That'd be a huge waste.
And I agree to whomever said that if someone know of a big wasted supercomputer with idle time on it, please advertise it here! All the ones I've ever seen are more-or-less utilized to their full extent.