NEC SX-9 to be World's Fastest Vector Computer

Oh? by SnoopJeDi · 2007-10-25 16:53 · Score: 2, Interesting

Yes, it runs a UNIX System V-compatible OS.

Of course, but the true question is...

Does it run Linux.

Cue the redundant replies and grouchy mods.

Re:Oh? by Anonymous Coward · 2007-10-25 17:01 · Score: 5, Funny

so awesome when girls post...
Re:Oh? by colourmyeyes · 2007-10-25 20:11 · Score: 5, Funny

user@host:~ $ ls You are about to list the files in this directory. Are you sure you want to do this? [y/n] y Enter Administrator password: We're sorry, using MS Bash 4.00 Basic you do not have the proper privilege level to view system files. Please purchase MS Bash 4.00 Mega, Ultra, or Extreme. Would you like to purchase one of these products now? [y/n] y We're sorry, this product is not upgradeable. Please reinstall your operating system, choosing "clean install" during the upgrade process. Thank you for choosing the rich user experience provided by MS Bash 4.00. MS Bash must now restart your computer.

--
My grandmother used anecdotal evidence all the time, and she lived to be 120 years old.

Logical question: by r_jensen11 · 2007-10-25 17:09 · Score: 2, Interesting

So, aside from having all of this power in one centralized spot, how does this compare to the combined power used for distributed computing projects like ClimatePrediction.net, fold@home, and any other project on Boinc?

Re:Logical question: by sophanes · 2007-10-25 17:38 · Score: 2, Informative

Put simply, the problem set that vector processors are geared towards (those involving large matrix ops) are the type clusters perform horribly at.
Re:Logical question: by deniable · 2007-10-25 17:42 · Score: 4, Informative

Well, distributed is often seen as poor-mans parallel, but in this case they don't compare. Vector units have large arrays of data and perform the same operation on all of them at once. Think array or matrix operations being done in one step rather than needing loops. This is where a SIMD architecture takes off.

The only unit I ever got to play with had a 64x32 grid of processors, you could add a row of numbers in log2(n) steps instead on n. It was cool because you could tell each processor to grab a value from the guy next to him (or n steps in a given direction from him) and so on. You could calculate dot products of matrices very quickly.

The distributed stuff you mentioned is mostly farming. Take a big loop of independent steps, break them up and pass them out to a (possibly) heterogeneous collection of processing nodes. Collect the answers when they finish. Render farms work the same way. It's a good way to break up some problems, but it's not what a vector unit does.

Now, I haven't touched this stuff for eleven years so my facts are possibly wrong. I'm sure someone will be along to correct me.
Re:Logical question: by ajs318 · 2007-10-25 23:31 · Score: 2, Insightful

Something people constantly misunderstand ..... is that they assume that all problems can be broken apart into manageable portions that can be split
Exactly! Having sex 39 times does not mean you will be able to get a baby in one week. Some operations are by nature sequential -- and while there is scope for some parallelisation, doing so in a highly-distributed fashion can end up increasing latency, because you end up spending more time splitting the data up and putting the results back together than actually doing any maths.

A vector processor basically has several separate logic matrices that work in parallel, performing the same operation on different (sets of) operands simultaneously. When you want to solve simultaneous equations in many variables, some of which are themselves multi-dimensional vectors, that can be extremely useful.

--
Je fume. Tu fumes. Nous fûmes!

Quite possibly. by jd · 2007-10-25 17:09 · Score: 4, Interesting

The architecture (a vector processor) is not in the vanilla kernel, but the kernel is fairly parallel, thread-safe and SMP-safe, so I really can't see any reason why you couldn't put Linux on such a platform. Because a lot of standard parallel software these days assumes a cluster of discrete nodes with shared resources, they'd be best borrowing code from Xen and possibly MOSIX to simulate a common structure.

(This would waste some of the compute power, but if the total time saved from not changing the application exceeds the time that could be saved using more of the cycles available, you win. It is this problem of creating illusions of whatever architecture happens to be application-friendly at a given time that has made much of my work in parallel architectures - such as the one produced by Lightfleet - so interesting... and so subject to office politics.)

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:Quite possibly. by Kristoph · 2007-10-25 17:29 · Score: 2, Informative

A user would pay the extremely high cost of a supercomputer - with it's proprietary memory architecture and interconnects - precisely because it can much more effectively scale up parallel processes then a cluster. If the benefit of that did not outweigh the cost of tailoring software to fit the device then these devices would never be made.

]{
Re:Quite possibly. by SamP2 · 2007-10-25 17:52 · Score: 4, Insightful

CAN run Linux and RUNS Linux are not quite the same thing.

To put things in perspective, 99% of PCs in the world CAN run Linux. :-)
Re:Quite possibly. by deniable · 2007-10-25 17:56 · Score: 2, Insightful

The front end OS for these things is pretty meaningless. Being a Unix like will keep the programmers and admins happy. The front-end is only a shell for the code running on the back-end processing units. These do all of the work and rely on specific hardware, instructions, and libraries to do things in *actual* parallel. These things basically exist to run big number crunching tasks for mathematicians and mathematicians in disguise like physicists. :) These people will generally be running their own code with tweaks for the hardware. They see Intel's SIMD instructions and think it's 'cute' and wonder what it will be when it grows up.
Re:Quite possibly. by Calinous · 2007-10-25 18:25 · Score: 4, Informative

The cost of the supercomputers is so high, that sometimes several man-month of tailoring the software to run as efficient as possible on the hardware could be recovered during a couple of days of processing.
For the kind of computation the supercomputer market requests, a 5% improvement in running speed on a supercomputer can worth millions
Re:Quite possibly. by Chrisq · 2007-10-25 21:42 · Score: 3, Insightful

This was certainly the case when I used vector processors. It is possible that the vector processor does not run an OS at all. It has been many years since I have worked on such a beast but when I did we ran a loader system with a standard OS which would cross compile code for the processor and load it almost onto the hardware (there was actually a small program we called a monitor to deal with I/O, etc, but no multi-tasking, security or anything). It would then run and the results were read back into the front-end computer.
Re:Quite possibly. by bockelboy · 2007-10-26 00:48 · Score: 4, Informative

That's the current, popular, Blue Gene/L architecture. The Blue Genes are composed of densely packed boards, each of which has a PowerPC chip and many vector processors. The PowerPC chips run a Linux-like OS and do some normal-looking I/O (filesystems, networking, etc), while the vector processors churn lots of data and have simplistic I/O.

That GP who suggests that Xen is used to distribute tasks obviously isn't familiar with the needs of big iron.

I can see the ads now by UnixUnix · 2007-10-25 17:10 · Score: 3, Funny

"Easter Island's Weather Forecasting Service believes operation of the NEC SX-9 would realize a 53% savings under Windows Server 2008 compared to under UNIX"

I can see tomorrow's headline... by maciarc · 2007-10-25 17:11 · Score: 2, Funny

"SCO files umpteen bazzillion dollar lawsuit against NEC"

Re:GFLOPS? TFLOPS? by thedarknite · 2007-10-25 17:13 · Score: 5, Funny

What's with these new fangled measurements.

I'd like to know what it is in Libraries of Congress per Jiffy

--
A game has objectives and is competitive, anything else is just play

Does it play vector games? by filesiteguy · 2007-10-25 17:13 · Score: 4, Funny

I wonder how well it will do with the really cool vector games like Asteroids or BattleZone or Tempest or...

...Star Wars! Yeah, we could take this little baby, setup a sit-down booth, add some speakers and we'd be set!

"what's your vector, Victor?"

--
The Kai's Semi-Updated Website Thingy

1.6 Teraflops? by JK_the_Slacker · 2007-10-25 17:13 · Score: 2, Funny

Why, that's more powerful than a cluster of 60 PS3s! I'll take three!

--
I'm waiting for a "-1 somepeoplejustshouldn'tgetmodprivileges" meta-moderation.

Re:1.6 Teraflops? by RuBLed · 2007-10-25 17:25 · Score: 5, Funny

Ahh... so you're planning to turn Aero on.

Re:GFLOPS? TFLOPS? by PresidentEnder · 2007-10-25 17:14 · Score: 3, Funny

Your units don't cancel properly. Flops = floating point operations / second, PS3s / foot-second = physical object / (viscosity / weight). You could stretch PS3s to be units of processing power / time, which gives you processing power / time / viscosity, which we'll fudge to be about flops / viscosity.

I dunno: maybe this thing could run faster at higher temperatures in lower gravity?

(/pretending to know what I'm talking about)

--
I used to carry a bottle of whiskey for snake bite. And two snakes. -Nefarious Wheel

Impressive ... by ackthpt · 2007-10-25 17:32 · Score: 2, Funny

with single core speeds of up to 102.4 GFLOPS and up to 1.6TFLOPS on a single node incorporating multiple CPUs.

Don't be too proud of this technological marvel you have created for it is nothing compared to the power of the slashdot effect.

--

A feeling of having made the same mistake before: Deja Foobar

SCO!? by flyingfsck · 2007-10-25 17:40 · Score: 2, Funny

Did they buy a license from SCO?

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!

vector machines in the top500 list refuse to die by paleshadows · 2007-10-25 17:41 · Score: 2, Informative

There's an interesting paper that analyzes the data accumulated in the top500 list site, which ranks the 500 most powerful supercomputers twice a year: it shows that, over time, the share of vector machines within the list is sharply declining, both in aggregated power and in number: from around 60% in 1993 to around 10% in 2003 (see Figure 3, page 6, in said paper). Still, vector machines refuse to die and always seem to maintain a presence in the top500, as is evident from the above slashdot post. Will vector machines live forever?

"up to" by Duncan3 · 2007-10-25 17:46 · Score: 4, Insightful

The only text that can ever follow the words "up to" in computing is "0.1 *". As in "speeds of up to 0.1 * 102.4 GFLOPS". Every time a marketing droid published a press release, a kitten dies.

--
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/

Grid Computing vs. Supercomputers? by SamP2 · 2007-10-25 17:47 · Score: 2, Insightful

Hate to burst your bubble, but while grid computing can certainly achieve strong speeds, it is not quite AS fast as you might think.

The entire SETI@HOME project (biggest grid computing project on the net) pumps out 274 teraflops. By comparison, Blue Gene L (first in series) pumps out 360 teraflops, and newer versions will achieve petaflop range, much faster than similar anticipation for grid computing projects.

Sure, you might say, that just like supercomputers evolve, so does grid computing. The problem is that a supercomputer is built for a particular purpose, while grid computing is saturated by all the stuff you can do with it (SETI, protein folding, cancer research, or whatever). Now I'm not saying any of these projects is not totally awesome, nor trying to put down the spirit of the community, but as more and more projects compete with each other for user's CPU, the individual share per project will drop. If you combine all grid computing projects put together with all supercomputers put together, the supercomputers win by a huge margin, and even if every single PC on the world would be hooked to a grid project when not used for its primary purpose, it would still unlikely to beat the sum total for dedicated supercomputer power as far as raw computational capability goes.

And it gets even worse if you account for centralized management of all the computing tasks, coordination, error and fake result checking, and simply the lag of transmitting all the grid packets across the net.

Grid computing projects are a very interesting and useful concept, but they won't ever replace supercomputers. Nor should they. They each are good for their own purpose.

Re:vector machines in the top500 list refuse to di by cerberusss · 2007-10-25 18:01 · Score: 2, Insightful

Will vector machines live forever?

Well, I actually doubt it. You could say 'those vector processors are used for matrix calculations and are wildly different from general purpose CPUs' and you'd be right.

However, I could see a point in time where hybrids like the Cell (one scalar processor and eight vector processors) will become so cheap that the number of vector machines will decline even more.

The idea will never die of course, I mean, hardware is so flexible nowadays that a good student could make a vector processor at home, if he had a development board with a fast Xilinx FPGA on it. But I think the decline will continue if hybrids will be used more often.

--
8 of 13 people found this answer helpful. Did you?

Re:GFLOPS? TFLOPS? by sqrt(2) · 2007-10-25 18:26 · Score: 3, Funny

Your units don't cancel properly... Oh no, physics class flashback! No! NOOOOOOOO! I don't want to do this whole equation again!

--
If you build it, nerds will come. Soylentnews.org

careful now: by Nirvelli · 2007-10-25 19:06 · Score: 2, Funny

Don't forget the Storm project!

Re:vector machines in the top500 list refuse to di by putaro · 2007-10-25 20:34 · Score: 2, Interesting

I haven't looked closely but I would guess (based on having worked at a manufacturer of vector supercomputers many years ago) that all of the machines represented on the Top 500 list are hybrid machines. All of the vector architectures I'm familiar with had a scalar processor to handle most of the housekeeping, run the OS, compilers and things like that. Vector processors aren't very good at doing things like that.

Vector excel at running through essentially loop operations. There's two components to their speed - one is the number of functional units that they have. Conceptually vector operations are applied across an entire array at once (in math speak, arrays are known as "vectors"). Hence they are automatically parallelizable and the more functional units you have the more of the operations can actually be applied in parallel. The other component, though, is their ability to run through data quickly. Since the vector knows that it will be running through a contiguous block of memory they can really get the memory system moving. Scalar processors and their caches are not designed for running in straight lines through data. It's pretty rare to see a cache that will go into a full streaming mode so they are continually starting and stopping the memory subsystem. A vector can issue prefetches for all of its data so you can build an interleaved memory system that will really move the data (we used to have 8-way interleave on our memory subsystem. The scalar didn't do all that well with that but the vector could max out the memory bus in a sustained manner).

No, the real question, vi or emacs? by SmallFurryCreature · 2007-10-25 21:19 · Score: 2, Funny

Come on, we need to know, what is the default editor, vi or emacs? We need to know.

--

MMO Quests are like orgasms:

You may solo them, I prefer them in a group.

Link to more information by Tom+Womack · 2007-10-25 22:11 · Score: 4, Informative

http://www.nec.de/hpc/hardware/sx-series/index.html

There are four PDFs there; the brochure is a four-colour glossy, but there is some real information. Sadly, the interesting-looking white papers are for the SX6, two generations earlier.

SX9 summary: 65nm technology, 3.2GHz clock speed, eight vector elements handled per cycle with two multiply and two add units, which is where the 102.4Gflop/CPU figure comes from. 16 CPUs in a box about the size of a standard 42U rack.

Totally absurdly fast (ten 64-bit words per cycle per CPU) access to a large (options are 512GB or 1TB) shared main memory; absurdly fast (128GB/second) inter-node bandwidth.

Re:GFLOPS? TFLOPS? by Fred_A · 2007-10-25 22:45 · Score: 3, Funny

Funnily enough - this isn't totally irrelevant.

In 2000, IBM, Toshiba and Sony collaborated to create a Cell processor, consisting of one scalar processor and eight vector processors, for the Sony PlayStation 3. - Wikipedia.org So there should be a Tera-PlayStation (TPS) measurement ? And if so who has the TPS report on that machine ?

--

May contain traces of nut.
Made from the freshest electrons.

Video and Interview with NEC Project Manager by dk3nn3dy · 2007-10-25 23:04 · Score: 2, Informative

There is a video news release and interview with the project manager here: http://movie.diginfo.tv/2007/10/26/07-0502-r.php

Re:vector machines in the top500 list refuse to di by putaro · 2007-10-26 00:06 · Score: 2, Interesting

Realize that most scientific code probably still has lots of code in it written for the original CRAY system it ran on in the 80's, and you see why vector systems will live on for a while: code that was written for one will have to be used on a vector system. One has to have to luck to find a PhD student willing and able to rewrite the code for a new machine. Worse than that even. I was doing this back in the late 80's/early 90's and we spent a large amount of energy getting the FORTRAN compiler to automatically vectorize "dusty deck" (that would be code that was originally written on PUNCHCARDS) scientific code.

Parallel programming is hard. Vectorized code is kind of like parallel light in that it parallelizes very narrow operations without all that messy locking and message passing.

Oh, there was one thing that the vector excelled at that OS's do a lot of - memory copying. When we instrumented our kernel (4.3 BSD derived) we found that it spent an awful lot of time in bcopy. One of the guys spent a fair amount of time implementing a "vcopy" which would use the vector to copy large blocks of memory. On our smoking fast 237 MB/s bus with 8-way interleaved memory the scalar CPU would top out at around 25/30 MB/s due to the interaction of the cache and the memory subsystem. The vector, though, could move at bus speed. Unfortunately, I don't think it ever worked as well in practice as in theory because there was a lot of overhead in getting the vector started, checking to make sure that it wasn't busy doing other work, etc. A dedicated DMA unit would give you the same effect.

FLOPS are useless by Anonymous Coward · 2007-10-26 01:54 · Score: 2, Funny

How many BogoMips does it have? =)

Unicos and Cray by wandazulu · 2007-10-26 01:57 · Score: 2, Interesting

Whenever I hear "supercomputer" and Unix I think of using a Cray and Unicos, which was the version of Unix that ran on them. Unicos was, at least the version I used, the ultimate in bare-bones Unix. I think when people think of Unix today they think of something like Linux or the BSDs or OS X, or whatever where the environment is very rich with tools. Unix on a supercomputer is not much more than an interface between your C (or Fortran) program and the bare metal; they don't (again, in my experience) make it the kind of environment you *use*...you get your code on the machine, compile it, submit it, and log off and wait for an email.

Maybe this NEC machine is different but Unix on a supercomputer is like the cockpit of a Forumula 1 race car; just there to provide a way to steer, comforts be damned.

Re:vector machines in the top500 list refuse to di by flaming-opus · 2007-10-26 03:29 · Score: 2, Informative

So here's what you're missing: Vector processors aren't about doing a lot of math. True, they do that very well, but that's not where they excell. Where vector processors really shine, is in memory bandwidth. Vector operations let you use that 4Terabyte/second of memory bandwidth, and actually use it, not spend it all flushing out cache lines. On this machine, a single load instruction can fetch 2KB of data.

Cell (and many GPUs or future whatever) have the ability to do a LOT of math, but they do it on a very tiny amount of data. These vector CPUs have dozens or hundreds of memory controllers. That's a lot of RAM chips, and a lot of copper wires between memory and the CPU. I'm sure the motherboard is dozens of layers thick for all the traces. In short, you can't get all that capability on a commodity processor, because the commodity market won't pay for all the memory bandwidth, which is expensive to engineer, and expensive per-unit.

Unless/untill there is a major change in the cost of memory, and memory bandwidth, there will still be a need for special-purpose supercomputing processors. This is not to say that Cray and NEC will continue to be the people to make such a thing. I'm sure IBM could come up with a cell-derived processor with a TON of real memory bandwidth, or maybe Nvidia. The question is: will they want to? I figure there's a lot more money to be made selling videogame consoles than there is at the high-end of the supercomputer market.

Slashdot Mirror

NEC SX-9 to be World's Fastest Vector Computer

38 of 137 comments (clear)