NEC SX-9 to be World's Fastest Vector Computer
An anonymous reader writes "NEC has announced the NEC SX-9 claiming it to be the fastest vector computer, with single core speeds of up to 102.4 GFLOPS and up to 1.6TFLOPS on a single node incorporating multiple CPUs. The machines can be used in complex large-scale computation, such as climates, aeronautics and space, environmental simulations, fluid dynamics, through the processing of array-handling with a single vector instruction. Yes, it runs a UNIX System V-compatible OS."
cluster!
The Admin and the Engineer
Of course, but the true question is...
Does it run Linux.
Cue the redundant replies and grouchy mods.
Forget esoteric units, how fast is it in Playstation3s per foot-second?
How can I believe you when you tell me what I don't want to hear?
So, aside from having all of this power in one centralized spot, how does this compare to the combined power used for distributed computing projects like ClimatePrediction.net, fold@home, and any other project on Boinc?
(This would waste some of the compute power, but if the total time saved from not changing the application exceeds the time that could be saved using more of the cycles available, you win. It is this problem of creating illusions of whatever architecture happens to be application-friendly at a given time that has made much of my work in parallel architectures - such as the one produced by Lightfleet - so interesting... and so subject to office politics.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
"Easter Island's Weather Forecasting Service believes operation of the NEC SX-9 would realize a 53% savings under Windows Server 2008 compared to under UNIX"
"SCO files umpteen bazzillion dollar lawsuit against NEC"
I think you're the fastest, but my dad says you don't work hard enough on integer arithmetic. And he says that lots of times, you don't even parallelize instructions. And that you don't really try...
What's with these new fangled measurements.
I'd like to know what it is in Libraries of Congress per Jiffy
A game has objectives and is competitive, anything else is just play
I wonder how well it will do with the really cool vector games like Asteroids or BattleZone or Tempest or...
"what's your vector, Victor?"
The Kai's Semi-Updated Website Thingy
Why, that's more powerful than a cluster of 60 PS3s! I'll take three!
I'm waiting for a "-1 somepeoplejustshouldn'tgetmodprivileges" meta-moderation.
I dunno: maybe this thing could run faster at higher temperatures in lower gravity?
(/pretending to know what I'm talking about)
I used to carry a bottle of whiskey for snake bite. And two snakes. -Nefarious Wheel
No, but it just might be in charge of Gundam.
"Operating systems suck: you're better off using only the BIOS" --trainsaw.com
Don't be too proud of this technological marvel you have created for it is nothing compared to the power of the slashdot effect.
A feeling of having made the same mistake before: Deja Foobar
Did they buy a license from SCO?
Excuse me, but please get off my Pennisetum Clandestinum, eh!
There's an interesting paper that analyzes the data accumulated in the top500 list site, which ranks the 500 most powerful supercomputers twice a year: it shows that, over time, the share of vector machines within the list is sharply declining, both in aggregated power and in number: from around 60% in 1993 to around 10% in 2003 (see Figure 3, page 6, in said paper). Still, vector machines refuse to die and always seem to maintain a presence in the top500, as is evident from the above slashdot post. Will vector machines live forever?
The only text that can ever follow the words "up to" in computing is "0.1 *". As in "speeds of up to 0.1 * 102.4 GFLOPS". Every time a marketing droid published a press release, a kitten dies.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
Hate to burst your bubble, but while grid computing can certainly achieve strong speeds, it is not quite AS fast as you might think.
The entire SETI@HOME project (biggest grid computing project on the net) pumps out 274 teraflops. By comparison, Blue Gene L (first in series) pumps out 360 teraflops, and newer versions will achieve petaflop range, much faster than similar anticipation for grid computing projects.
Sure, you might say, that just like supercomputers evolve, so does grid computing. The problem is that a supercomputer is built for a particular purpose, while grid computing is saturated by all the stuff you can do with it (SETI, protein folding, cancer research, or whatever). Now I'm not saying any of these projects is not totally awesome, nor trying to put down the spirit of the community, but as more and more projects compete with each other for user's CPU, the individual share per project will drop. If you combine all grid computing projects put together with all supercomputers put together, the supercomputers win by a huge margin, and even if every single PC on the world would be hooked to a grid project when not used for its primary purpose, it would still unlikely to beat the sum total for dedicated supercomputer power as far as raw computational capability goes.
And it gets even worse if you account for centralized management of all the computing tasks, coordination, error and fake result checking, and simply the lag of transmitting all the grid packets across the net.
Grid computing projects are a very interesting and useful concept, but they won't ever replace supercomputers. Nor should they. They each are good for their own purpose.
If a single node can go over 1TFLOP, than a 1000 node cluster could actually break the PetaFLOP barrier. How feasible is a cluster of that size?
What the heck is LoCs/J measuring...librarian/archiver/etc. efficiency?
However, I could see a point in time where hybrids like the Cell (one scalar processor and eight vector processors) will become so cheap that the number of vector machines will decline even more.
The idea will never die of course, I mean, hardware is so flexible nowadays that a good student could make a vector processor at home, if he had a development board with a fast Xilinx FPGA on it. But I think the decline will continue if hybrids will be used more often.
8 of 13 people found this answer helpful. Did you?
one can only imagine what a game of 'tail gunner' or tempest would look like on this machine.
If you build it, nerds will come. Soylentnews.org
LoCs is a measure of data, jiffy is a measure of time. Therefore LoCs/J is a unit of work cycles much like hertz.
A game has objectives and is competitive, anything else is just play
Perhaps you mean a different thing than I do when you say "science."
I guess a more accurate question would be, how high a stack of PS3s running in parallel would you need to equal this thing's processing power? I so can't be bothered...
Don't forget the Storm project!
In hindsight, I think I should have used P(arallel)-ICBMs as a measurement.
A game has objectives and is competitive, anything else is just play
Vector graphics are so 1980's... oh wait
There is simply too much glass..
I was reading the description of the system, and thinking I would never be able to operate it as I am such a dinosaur until I saw the above line. My response is: "This is Unix, I know this!"
Try to hack my 31337 firewall!
... Imagine what you could do with a beowulf cluster of these!
Gravity Sucks
Hmm, did I forget any ...
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh
I haven't looked closely but I would guess (based on having worked at a manufacturer of vector supercomputers many years ago) that all of the machines represented on the Top 500 list are hybrid machines. All of the vector architectures I'm familiar with had a scalar processor to handle most of the housekeeping, run the OS, compilers and things like that. Vector processors aren't very good at doing things like that.
Vector excel at running through essentially loop operations. There's two components to their speed - one is the number of functional units that they have. Conceptually vector operations are applied across an entire array at once (in math speak, arrays are known as "vectors"). Hence they are automatically parallelizable and the more functional units you have the more of the operations can actually be applied in parallel. The other component, though, is their ability to run through data quickly. Since the vector knows that it will be running through a contiguous block of memory they can really get the memory system moving. Scalar processors and their caches are not designed for running in straight lines through data. It's pretty rare to see a cache that will go into a full streaming mode so they are continually starting and stopping the memory subsystem. A vector can issue prefetches for all of its data so you can build an interleaved memory system that will really move the data (we used to have 8-way interleave on our memory subsystem. The scalar didn't do all that well with that but the vector could max out the memory bus in a sustained manner).
Who thought this was Insightful? We need -10 predictable as an option.
You'd better hurry, the raptors are breaking through the doors....
Karma: Non-Heinous
Come on, we need to know, what is the default editor, vi or emacs? We need to know.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
Realize that most scientific code probably still has lots of code in it written for the original CRAY system it ran on in the 80's, and you see why vector systems will live on for a while: code that was written for one will have to be used on a vector system. One has to have to luck to find a PhD student willing and able to rewrite the code for a new machine.
I really have the experience that the hardware possibilities are growing faster than the (operating) software using it. Just from a practical point of view: Say a research center gets a new cluster. It takes at least half a year to a year to get it configured correctly, find all the broken hardware and replace it, get network problems out of the way. By then it has already dropped 50 places in the top500 and it will still not be used to even half of its full power. Even though supercomputing is as old as computing itself, there is still no 'plug-n-play' solution to get a supercomputer running. And I am not talking about home-made beowulf clusters here, try with Dell, HP, Sun, or IBM, it just won't work out of the package.
And besides that you end up with the next problem, how to get your program to optimally use the shared memory-core nodes AND the interconnected nodes. These are all non-standard solutions. At the moment you can buy a quad core home machine for about 700 euro, but is there any software written for it? It is time for a smart message-passing/shared memory programming style that will automatically optimize for the hardware it is compiled on.
Until these problems are solved, multi-core computing will be suboptimal, and by just looking at the increase in TFLOPS peak performance of a system we will just be fooling ourselves.
molmod.com - computing tips from a molecular modeling
http://www.nec.de/hpc/hardware/sx-series/index.html
There are four PDFs there; the brochure is a four-colour glossy, but there is some real information. Sadly, the interesting-looking white papers are for the SX6, two generations earlier.
SX9 summary: 65nm technology, 3.2GHz clock speed, eight vector elements handled per cycle with two multiply and two add units, which is where the 102.4Gflop/CPU figure comes from. 16 CPUs in a box about the size of a standard 42U rack.
Totally absurdly fast (ten 64-bit words per cycle per CPU) access to a large (options are 512GB or 1TB) shared main memory; absurdly fast (128GB/second) inter-node bandwidth.
have obligatory cluster yay retarded replies. Firstly ... I highly doubt this is running linux on the vector part of the core itself, more likely than anything it has a von Neumann machine on the core somewhere or even separately and I will say the obvious here...which I am sure everyone here knows (especialy those people yelling retarded cluster noises) parallelization is as only good as the algorithm (http://en.wikipedia.org/wiki/Amdahl's_law) ... a vector computer is like a bigger version of a GPU with a more generalized pipeline, the clusterability of this would ofcourse depend on not only the algorithm but also the way data is marshelled in and out of the pipeline.
May contain traces of nut.
Made from the freshest electrons.
The SX series do not run with Linux. Yes the OS is POSIX like but has not as much frills as Linux. The OS does the bare minimum; mind you just want to crush numbers. Now, if you write a FORTRAN program or C you wont have major issue porting it to the SX.
As for porting Linux on a Vector system, well it is not that easy. Totally different memory system is one of the major issues, here fully SMP. Most kernel code is proprietary and I dont see NEC opening it to the public.
There is a video news release and interview with the project manager here: http://movie.diginfo.tv/2007/10/26/07-0502-r.php
Parallel programming is hard. Vectorized code is kind of like parallel light in that it parallelizes very narrow operations without all that messy locking and message passing.
Oh, there was one thing that the vector excelled at that OS's do a lot of - memory copying. When we instrumented our kernel (4.3 BSD derived) we found that it spent an awful lot of time in bcopy. One of the guys spent a fair amount of time implementing a "vcopy" which would use the vector to copy large blocks of memory. On our smoking fast 237 MB/s bus with 8-way interleaved memory the scalar CPU would top out at around 25/30 MB/s due to the interaction of the cache and the memory subsystem. The vector, though, could move at bus speed. Unfortunately, I don't think it ever worked as well in practice as in theory because there was a lot of overhead in getting the vector started, checking to make sure that it wasn't busy doing other work, etc. A dedicated DMA unit would give you the same effect.
wow... "imagine a Beowulf cluster of those"...
If you're spending this much on a machine you're not going to settle for 80% of your application's performance just so you can save a bit of programming time...
Japan's HPC industry has gone to shit. The only reason it still exists is because they somehow manage to con the Japanese taxpaying public out of hundreds of millions of dollars each year to fund "research" into "next-generation" supercomputers.
Next-generation my ass. sx-9, sx-10, sx-11... *splud*
fuck you, NEC.
Do you know about the new cover sheets for them? I'll send another copy of that memo your way.
now suck my cock!
http://it.slashdot.org/article.pl?sid=07/10/25/1521258
That's what is thought of VIRTUALIZATION (VMWare) & security Bert64. This is in regards to our discussion here earlier this year:
http://linux.slashdot.org/comments.pl?sid=320419&cid=20953889
APK
I remember that in the Engineering computer department we had an old Analog Computer.
One programmed the amplifiers and feedback loops with patch cables and then sent it
to A/D converters to get the output which then was retrieved thru a data bus
to the digital computer side and manipulated/graphed from there. Mostly you'd set up state-vector and (partial) differential equations on it. I never saw a book on how to program the thing (a PDP Analog computer I think). So I guess these old computers get lost to the dust bins of computer history!
Anybody recall the Analog Computer days. Present day textbooks don't even make mention
that Analog computers existed. So todays computer/engineering students don't know about
them either......
How many BogoMips does it have? =)
I'm not sure about that, but I hear it can make the Kessel run in 12 parsecs.
Whenever I hear "supercomputer" and Unix I think of using a Cray and Unicos, which was the version of Unix that ran on them. Unicos was, at least the version I used, the ultimate in bare-bones Unix. I think when people think of Unix today they think of something like Linux or the BSDs or OS X, or whatever where the environment is very rich with tools. Unix on a supercomputer is not much more than an interface between your C (or Fortran) program and the bare metal; they don't (again, in my experience) make it the kind of environment you *use*...you get your code on the machine, compile it, submit it, and log off and wait for an email.
Maybe this NEC machine is different but Unix on a supercomputer is like the cockpit of a Forumula 1 race car; just there to provide a way to steer, comforts be damned.
100% of the PCs CAN run linux. More than 99% of the comptuers in the world CAN run linux.
~10000 would be a good guess.
Quote: "Mueller, an associate professor of computer science, has built a supercomputing cluster capable of both high-performance computing and running the latest in computer gaming. His cluster of eight PS3 machines - the first such academic cluster in the world - packs the power of a small supercomputer, but at a total cost of about $5,000, it costs less than some desktop computers that have only a fraction of the computing power.
...
Mueller estimates that with approximately 10,000 PS3 machines anyone could create the fastest computer in the world - albeit with limited single-precision capabilities and networking constraints."
CC.
TaijiQuan (Huang, 5 loosenings)
What do you think your Pentium's MMX instructions are? They're vector operations. Every machine on the list is already a hybrid between the two. They aren't dedicated individual vector processors under the command of a master GP-CPU, but a different version of hybridization.
I'd actually suggest that you'll probably see vector processors marginalized or pushed out eventually by stream processors: aka nvidia/ati graphics boards.
Slashdot Patriotism: We Support our Dupes!
So here's what you're missing: Vector processors aren't about doing a lot of math. True, they do that very well, but that's not where they excell. Where vector processors really shine, is in memory bandwidth. Vector operations let you use that 4Terabyte/second of memory bandwidth, and actually use it, not spend it all flushing out cache lines. On this machine, a single load instruction can fetch 2KB of data.
Cell (and many GPUs or future whatever) have the ability to do a LOT of math, but they do it on a very tiny amount of data. These vector CPUs have dozens or hundreds of memory controllers. That's a lot of RAM chips, and a lot of copper wires between memory and the CPU. I'm sure the motherboard is dozens of layers thick for all the traces. In short, you can't get all that capability on a commodity processor, because the commodity market won't pay for all the memory bandwidth, which is expensive to engineer, and expensive per-unit.
Unless/untill there is a major change in the cost of memory, and memory bandwidth, there will still be a need for special-purpose supercomputing processors. This is not to say that Cray and NEC will continue to be the people to make such a thing. I'm sure IBM could come up with a cell-derived processor with a TON of real memory bandwidth, or maybe Nvidia. The question is: will they want to? I figure there's a lot more money to be made selling videogame consoles than there is at the high-end of the supercomputer market.
Pretty fast, but IBM will release its Roadrunner at Los Alamos NL next year with 1.6PFLOPS:
16000 CellBE chips have a capacity of 725GFLOPS each for 11.6PFLOPS. So it's expected to operate at only about 14% of HW capacity. There's a lot of room for that Roadrunner, or a machine like it, to stay at the top of the heap.
--
make install -not war
Only if it shoots first.
Yes, vector machines will "forever" be the most suitable tool for vector computing.
Mind you, while I love Top 500 and the funky gear on the list, it's a very limited benchmark. Linpack, I don't remember what else is there (and can't be arsed to check now). None of the stuff these babies with their jaw-dropping interconnect architectures excel in. It doesn't really penalize clusters (with their flimsy Infiniband or Myrinet or 10GbE) like some real-world simulation jobs would. Admitted, Blue Gene deserves to be there, if for the said wrong reasons.
Well put. I've never really had that top down view before.
IBM and Los Alamos National Laboratory announced something faster months ago. Roadrunner is 1.3 petaFlops peak vs 0.839 petaFlops peak for the NEC. See http://www.lanl.gov/roadrunner/ for many details on the Roadrunner system design.