NEC SX-9 to be World's Fastest Vector Computer
An anonymous reader writes "NEC has announced the NEC SX-9 claiming it to be the fastest vector computer, with single core speeds of up to 102.4 GFLOPS and up to 1.6TFLOPS on a single node incorporating multiple CPUs. The machines can be used in complex large-scale computation, such as climates, aeronautics and space, environmental simulations, fluid dynamics, through the processing of array-handling with a single vector instruction. Yes, it runs a UNIX System V-compatible OS."
Of course, but the true question is...
Does it run Linux.
Cue the redundant replies and grouchy mods.
So, aside from having all of this power in one centralized spot, how does this compare to the combined power used for distributed computing projects like ClimatePrediction.net, fold@home, and any other project on Boinc?
(This would waste some of the compute power, but if the total time saved from not changing the application exceeds the time that could be saved using more of the cycles available, you win. It is this problem of creating illusions of whatever architecture happens to be application-friendly at a given time that has made much of my work in parallel architectures - such as the one produced by Lightfleet - so interesting... and so subject to office politics.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I haven't looked closely but I would guess (based on having worked at a manufacturer of vector supercomputers many years ago) that all of the machines represented on the Top 500 list are hybrid machines. All of the vector architectures I'm familiar with had a scalar processor to handle most of the housekeeping, run the OS, compilers and things like that. Vector processors aren't very good at doing things like that.
Vector excel at running through essentially loop operations. There's two components to their speed - one is the number of functional units that they have. Conceptually vector operations are applied across an entire array at once (in math speak, arrays are known as "vectors"). Hence they are automatically parallelizable and the more functional units you have the more of the operations can actually be applied in parallel. The other component, though, is their ability to run through data quickly. Since the vector knows that it will be running through a contiguous block of memory they can really get the memory system moving. Scalar processors and their caches are not designed for running in straight lines through data. It's pretty rare to see a cache that will go into a full streaming mode so they are continually starting and stopping the memory subsystem. A vector can issue prefetches for all of its data so you can build an interleaved memory system that will really move the data (we used to have 8-way interleave on our memory subsystem. The scalar didn't do all that well with that but the vector could max out the memory bus in a sustained manner).
Parallel programming is hard. Vectorized code is kind of like parallel light in that it parallelizes very narrow operations without all that messy locking and message passing.
Oh, there was one thing that the vector excelled at that OS's do a lot of - memory copying. When we instrumented our kernel (4.3 BSD derived) we found that it spent an awful lot of time in bcopy. One of the guys spent a fair amount of time implementing a "vcopy" which would use the vector to copy large blocks of memory. On our smoking fast 237 MB/s bus with 8-way interleaved memory the scalar CPU would top out at around 25/30 MB/s due to the interaction of the cache and the memory subsystem. The vector, though, could move at bus speed. Unfortunately, I don't think it ever worked as well in practice as in theory because there was a lot of overhead in getting the vector started, checking to make sure that it wasn't busy doing other work, etc. A dedicated DMA unit would give you the same effect.
Whenever I hear "supercomputer" and Unix I think of using a Cray and Unicos, which was the version of Unix that ran on them. Unicos was, at least the version I used, the ultimate in bare-bones Unix. I think when people think of Unix today they think of something like Linux or the BSDs or OS X, or whatever where the environment is very rich with tools. Unix on a supercomputer is not much more than an interface between your C (or Fortran) program and the bare metal; they don't (again, in my experience) make it the kind of environment you *use*...you get your code on the machine, compile it, submit it, and log off and wait for an email.
Maybe this NEC machine is different but Unix on a supercomputer is like the cockpit of a Forumula 1 race car; just there to provide a way to steer, comforts be damned.