NEC SX-9 to be World's Fastest Vector Computer

beowulf by catmistake · 2007-10-25 16:52 · Score: 0, Redundant

cluster!

--
The Admin and the Engineer

Re:beowulf by Anonymous Coward · 2007-10-26 02:29 · Score: 0

The first comment on an article is considered "redundant"? Very odd indeed...

I've read the article, and "Beowulf cluster" is not mentioned at all. I'm not seeing how this comment could be considered redundant.

To the moderator:
I do not think that word means what you think it means...
Re:beowulf by Poltras · 2007-10-26 02:54 · Score: 0, Offtopic

Yeah, because we haven't seen that troll ever before... BTW, I'm posting this unanonymously to prove I haven't moderated, and that I have karma to burn

--
Of Code And Men
Re:beowulf by catmistake · 2007-10-26 03:39 · Score: 0, Flamebait

That was my 2nd 1st post ever... and it should be modded funny, because, well, it is the 1st post, and with 2 words and a punctuation I clearly capture at least 3 levels characteristic of the typical /.-user : 1) what comes to mind when ever a single powerful computer is considered, 2) uncontrolable nerdy enthusiasm, and 3) the inability to effectively communicate anything meaningful with human language.

--
The Admin and the Engineer
Re:beowulf by armareum · 2007-10-26 06:23 · Score: 1

No, it should be modded funny if it's funny. And the Beowulf cluster meme has never made me laugh..

--
Is this a rhetorical question?
Re:beowulf by catmistake · 2007-10-26 12:45 · Score: 1

well, crud, sorry I couldn't get the first post on a story about possible alien life or newly discovered invading insects or robots, and posted a "I, for one, welcome our new _________ _________ overlords!" salute. Its a fast computer story. I didn't have much to work with, but... we just do what we can. If I am fortunate enough again to ever be graced with a first post, I hope I have the fortitude, the intelligence, the wisdom, the insight, the... the wit, in those dwindling milliseconds to actually post something that might make you crack a smile... otherwise... why... bother... with anything? Then again... I got a first post. Suck it.

--
The Admin and the Engineer
Re:beowulf by armareum · 2007-10-26 21:49 · Score: 1

No-one who counts gives two shits you got first post.

--
Is this a rhetorical question?
Re:beowulf by catmistake · 2007-10-27 02:17 · Score: 1

Blasphemy! Mod the heretic!

--
The Admin and the Engineer

Oh? by SnoopJeDi · 2007-10-25 16:53 · Score: 2, Interesting

Yes, it runs a UNIX System V-compatible OS.

Of course, but the true question is...

Does it run Linux.

Cue the redundant replies and grouchy mods.

Re:Oh? by Anonymous Coward · 2007-10-25 16:55 · Score: 0

> Does it run Linux?

Umm, who cares?
Re:Oh? by azulza · 2007-10-25 16:58 · Score: 1

Of course, but the true question is...
Does it run Vista, without being a slow mofo.
Cue redundant linux rants / MS bash
Re:Oh? by Anonymous Coward · 2007-10-25 17:01 · Score: 5, Funny

so awesome when girls post...
Re:Oh? by Anonymous Coward · 2007-10-25 19:11 · Score: 1, Funny

...MS made Bash?
Re:Oh? by colourmyeyes · 2007-10-25 20:11 · Score: 5, Funny

user@host:~ $ ls You are about to list the files in this directory. Are you sure you want to do this? [y/n] y Enter Administrator password: We're sorry, using MS Bash 4.00 Basic you do not have the proper privilege level to view system files. Please purchase MS Bash 4.00 Mega, Ultra, or Extreme. Would you like to purchase one of these products now? [y/n] y We're sorry, this product is not upgradeable. Please reinstall your operating system, choosing "clean install" during the upgrade process. Thank you for choosing the rich user experience provided by MS Bash 4.00. MS Bash must now restart your computer.

--
My grandmother used anecdotal evidence all the time, and she lived to be 120 years old.
Re:Oh? by Anonymous Coward · 2007-10-25 20:50 · Score: 0

No, it doesn't run Linux (and can't). For what it's worth, the processor isn't even supported by GCC.
Re:Oh? by Anonymous Coward · 2007-10-26 00:47 · Score: 0

girls?? where??
Re:Oh? by Anonymous Coward · 2007-10-26 02:03 · Score: 0

According to NEC:
The SX Series SUPER-UX operating system is a System V port with additional features from 4.3 BSD plus enhancements to support supercomputing requirements. It has been in widespread production use since 1990.
Re:Oh? by Anonymous Coward · 2007-10-26 03:24 · Score: 0

only a girl wouldn't know who cares if it runs Linux

GFLOPS? TFLOPS? by pushing-robot · 2007-10-25 16:58 · Score: 1, Offtopic

Forget esoteric units, how fast is it in Playstation3s per foot-second?

--
How can I believe you when you tell me what I don't want to hear?

Logical question: by r_jensen11 · 2007-10-25 17:09 · Score: 2, Interesting

So, aside from having all of this power in one centralized spot, how does this compare to the combined power used for distributed computing projects like ClimatePrediction.net, fold@home, and any other project on Boinc?

Re:Logical question: by Silverlancer · 2007-10-25 17:20 · Score: 1

Something people constantly misunderstand about supercomputers is that they assume that all problems can be broken apart into manageable portions that can be split among thousands of computers. Many problems exist that have vast memory requirements and/or require interaction among all parts of the problem, and so have to be run on a single supercomputer.
Re:Logical question: by sophanes · 2007-10-25 17:38 · Score: 2, Informative

Put simply, the problem set that vector processors are geared towards (those involving large matrix ops) are the type clusters perform horribly at.
Re:Logical question: by deniable · 2007-10-25 17:42 · Score: 4, Informative

Well, distributed is often seen as poor-mans parallel, but in this case they don't compare. Vector units have large arrays of data and perform the same operation on all of them at once. Think array or matrix operations being done in one step rather than needing loops. This is where a SIMD architecture takes off.

The only unit I ever got to play with had a 64x32 grid of processors, you could add a row of numbers in log2(n) steps instead on n. It was cool because you could tell each processor to grab a value from the guy next to him (or n steps in a given direction from him) and so on. You could calculate dot products of matrices very quickly.

The distributed stuff you mentioned is mostly farming. Take a big loop of independent steps, break them up and pass them out to a (possibly) heterogeneous collection of processing nodes. Collect the answers when they finish. Render farms work the same way. It's a good way to break up some problems, but it's not what a vector unit does.

Now, I haven't touched this stuff for eleven years so my facts are possibly wrong. I'm sure someone will be along to correct me.
Re:Logical question: by ajs318 · 2007-10-25 23:31 · Score: 2, Insightful

Something people constantly misunderstand ..... is that they assume that all problems can be broken apart into manageable portions that can be split
Exactly! Having sex 39 times does not mean you will be able to get a baby in one week. Some operations are by nature sequential -- and while there is scope for some parallelisation, doing so in a highly-distributed fashion can end up increasing latency, because you end up spending more time splitting the data up and putting the results back together than actually doing any maths.

A vector processor basically has several separate logic matrices that work in parallel, performing the same operation on different (sets of) operands simultaneously. When you want to solve simultaneous equations in many variables, some of which are themselves multi-dimensional vectors, that can be extremely useful.

--
Je fume. Tu fumes. Nous fûmes!
Re:Logical question: by bockelboy · 2007-10-26 00:56 · Score: 1

Eventually, the combined power of the Boinc architecture will be much larger than any supercomputer in terms of CPU, yet be total insufficient for any of the supercomputer's task.

Here's the experiment I've used to teach the concepts: Take a deck of cards, shuffle it, and time yourself sorting it. Now, have 1 other person help you sort it - it should be about 2 times as fast, maybe a little slower.

Repeat again with increasing number of people until you have 1 card per person. You now have a room full of bright, capable people, yet it will take you much longer to finish the task. This is because you are bogged down by the algorithm - which is not parallel - and the fact you need lots of communication between people.

The kind of tasks grids are useful for are only those that are highly parallel. Think about counting a jar full of marbles. You can give all 52 people a big handle of marbles and add the results at the end; you should be able to achieve about a 52-times speedup, given enough starting marbles.
Re:Logical question: by LWATCDR · 2007-10-26 03:06 · Score: 1

It doesn't compare at all.

They are not used for the same type of problems. Some problems are ideal for cluster systems like the ones you have described. Others are are ideal for Vector systems like the SX. They don't compair well at all because they are not used for the same type of problem.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Logical question: by flaming-opus · 2007-10-26 04:14 · Score: 1

Well, the answer depends on your problem. These are sort of at the opposite end of the spectrum from distributed. There are a lot of solutions in between. In order from cheapest to most expensive per flop.

Distributed computing needs to do a lot of computation on very tiny bits of data. You can pack up the problem set and send it over the internet, then do an hour or two of work on a CPU, and send another internet-sized transfer back. It's very economical, only cares about raw cpu performance, and can't be used for very many tasks. [you can use any hardware for these]

Low-end Commodity clusters are build up of cheap servers connected with a cheap interconnect like ethernet. They can pass data between nodes at several tens of MB/second, and with latencies under a tenth of a second (realized). Again, most of the concern is the raw processor performance, though a little more coordination amongst processors is needed, and the amount of data being crunched can exceed a node's memory size, though it's best if it doesn't. [everybody and their mother makes these, HP and IBM make a lot of money on these]

High-end commodity clusters add a much higher performance interconnect like quadrics, myrinet, or infinaband. They also add software features like parallel filesystems, and virtualized system mangement. They also use higher-reliability nodes, and maybe more processors per node. [IBM, HP, SUN, SGI, fujitsu, Bull, and a lot of smaller shops sell these]

scalar-supercomputers are integrated machines using commodity processors, but with custom interconnects, packaging, operating systems, and software. They can often use more sophisticated memory sharing programming models, though are often still programmed with mpi. They pass even more data around the nodes, more often. They have parallel filesystems and have highly integrated system management tools. They tend to deploy a lot of redundancy features to insulate the user from node failures. [IBM, Cray, SGI]

vector-supercomputers use custom vector processors to amortize the cost of memory loads/stores across the computation, and do away with a lot of the branches that exist in scalar code. This allows VERY high performance on some codes (often many tens of times the performance of scalar processors), but poor performance on code that doesn't vectorize well. They are very expensive, but quite useful for a select set of usage schenarios. [NEC and Cray]

A rough estimate of the cost per peak-flop is that it approximately doubles for each of these, as you go up the list. It's important to realize, however, that peak-flops isn't the same as realized-flops.
Re:Logical question: by kjs3 · 2007-10-26 07:08 · Score: 1

Not all problems lend themselves to a distributed solution.
Re:Logical question: by Anonymous Coward · 2007-10-26 10:49 · Score: 0

Some problems, climate and weather modelling are good examples, are difficult to parallelise efficiently. Integrating these models at increasingly high resolutions soon involves many gigabytes of RAM and is limited by the bandwidth between the cpu and memory. A vector machine can really help in this regard. Especially one that can run processes on 16 (say) cpus all sharing the same memory bus. A bus that can feed all 16 processors at the maximum rate. (Much) Less expensive multi-cpu Intel and AMD boxes are great (and much less expensive to purchase) but don't really come with enough memory and don't have a bus that can truly handle several processes running at 100% at the same time. It is simple to set up a benchmark on a dual cpu "pizza-box" cluster type "linux" computer with a small climate model that takes 1.5 times (on the wall clock) as long to run when two processes are running as when one process is running. In other words, these cheap machines don't have the memory bus bandwidth to feed models that are this computationally intensive (in my anonymous experience).

Climate models also benefit from vectorisation just by way of the kinds of equations that are being integrated Many operations are identical but performed on all the elements of an array representing some variable in the model. Vectorisation can perform these "at once" when the code is written the right way. Splitting into subdomains and sending these to different machines would work as well except for the the fact that the equations all depend on fluxes of various quantities across boundaries at all scales in the model. Thus dividing a three dimensional ocean into sub-domains requires communication of these fluxes at the boundaries of the domains. This communication soon overwhelms any benefits of subdividing the model domain. Often, using more than about 32 cpus doesn't add substantial benefits. Therefore, faster cpus help in the right circumstances.

Quite possibly. by jd · 2007-10-25 17:09 · Score: 4, Interesting

The architecture (a vector processor) is not in the vanilla kernel, but the kernel is fairly parallel, thread-safe and SMP-safe, so I really can't see any reason why you couldn't put Linux on such a platform. Because a lot of standard parallel software these days assumes a cluster of discrete nodes with shared resources, they'd be best borrowing code from Xen and possibly MOSIX to simulate a common structure.

(This would waste some of the compute power, but if the total time saved from not changing the application exceeds the time that could be saved using more of the cycles available, you win. It is this problem of creating illusions of whatever architecture happens to be application-friendly at a given time that has made much of my work in parallel architectures - such as the one produced by Lightfleet - so interesting... and so subject to office politics.)

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:Quite possibly. by Kristoph · 2007-10-25 17:29 · Score: 2, Informative

A user would pay the extremely high cost of a supercomputer - with it's proprietary memory architecture and interconnects - precisely because it can much more effectively scale up parallel processes then a cluster. If the benefit of that did not outweigh the cost of tailoring software to fit the device then these devices would never be made.

]{
Re:Quite possibly. by SamP2 · 2007-10-25 17:52 · Score: 4, Insightful

CAN run Linux and RUNS Linux are not quite the same thing.

To put things in perspective, 99% of PCs in the world CAN run Linux. :-)
Re:Quite possibly. by deniable · 2007-10-25 17:56 · Score: 2, Insightful

The front end OS for these things is pretty meaningless. Being a Unix like will keep the programmers and admins happy. The front-end is only a shell for the code running on the back-end processing units. These do all of the work and rely on specific hardware, instructions, and libraries to do things in *actual* parallel. These things basically exist to run big number crunching tasks for mathematicians and mathematicians in disguise like physicists. :) These people will generally be running their own code with tweaks for the hardware. They see Intel's SIMD instructions and think it's 'cute' and wonder what it will be when it grows up.
Re:Quite possibly. by Calinous · 2007-10-25 18:25 · Score: 4, Informative

The cost of the supercomputers is so high, that sometimes several man-month of tailoring the software to run as efficient as possible on the hardware could be recovered during a couple of days of processing.
For the kind of computation the supercomputer market requests, a 5% improvement in running speed on a supercomputer can worth millions
Re:Quite possibly. by Anonymous Coward · 2007-10-25 18:56 · Score: 0, Flamebait

99% of PCs in the world CAN run Linux. :-)

Unless you want working device drivers :-P
Re:Quite possibly. by Chrisq · 2007-10-25 21:42 · Score: 3, Insightful

This was certainly the case when I used vector processors. It is possible that the vector processor does not run an OS at all. It has been many years since I have worked on such a beast but when I did we ran a loader system with a standard OS which would cross compile code for the processor and load it almost onto the hardware (there was actually a small program we called a monitor to deal with I/O, etc, but no multi-tasking, security or anything). It would then run and the results were read back into the front-end computer.
Re:Quite possibly. by ccs.gott · 2007-10-25 22:05 · Score: 0

CAN run Linux and RUNS Linux are not quite the same thing.

To put things in perspective, 99% of PCs in the world CAN run Linux. :-)
And so can 99% of the toasters...
Re:Quite possibly. by anarxia · 2007-10-25 23:13 · Score: 1

Calling this a PC (Personal Computer) is a bit of a stretch :)
Re:Quite possibly. by bockelboy · 2007-10-26 00:48 · Score: 4, Informative

That's the current, popular, Blue Gene/L architecture. The Blue Genes are composed of densely packed boards, each of which has a PowerPC chip and many vector processors. The PowerPC chips run a Linux-like OS and do some normal-looking I/O (filesystems, networking, etc), while the vector processors churn lots of data and have simplistic I/O.

That GP who suggests that Xen is used to distribute tasks obviously isn't familiar with the needs of big iron.
Re:Quite possibly. by Anonymous Coward · 2007-10-26 01:40 · Score: 0

T H A N you fucking idiot
Re:Quite possibly. by Anonymous Coward · 2007-10-26 02:43 · Score: 0

Hmm, a BlueGene/L system does not have any vector processor in it: all its power comes from the densely packed PowerPC chips!

Compute nodes (the one you described as "vector processors") are the one churning data; I/O nodes, which are essentially the same PowerPC chips are only used for communication with the "outside world" of the BG/L, i.e. file servers, via Gigabit ethernet.
Communications between compute nodes is done via one of the internal networks (torus-shaped, or tree-shaped).

It is true that the I/O nodes run a minimalist Linux distribution (you can even SSH to each I/O node, usually there is one I/O node for each 32 compute nodes), whereas on the compute nodes there's a Linux-like OS, but it is only mono-task and mono-user to reduce context-switching penalties.
Re:Quite possibly. by flaming-opus · 2007-10-26 03:36 · Score: 1

I think you are mistaken. There are no vector processors in blue gene/L. BG/l is composed entirely of IBM ppc/440 cores. Each node (out of 65,000) is composed of 2 ppc scalar cores. In most cases one runs the application, and one handles the message passing. The Blue Gene/P uses 4-core nodes, but is otherwise similar.

The cell processor has many scalar cores, which can be programmed to behave a little-bit like a vector processor, though they really aren't. Cell processors are not currently used in Blue-Gene systems, though I suspect they might be someday. modified cell's are used in some supercomputers, but they are still the third focus in IBM's supercomputing arsenal.
Re:Quite possibly. by flaming-opus · 2007-10-26 03:45 · Score: 1

The NEC processor is modern in its memory protection, so linux could easily be ported, however, there's a lot of time/money invested in super/ux so there's little incentive to do so. Even if linux were ported, it wouldn't be like the linux running on your desktop, it would be a stripped-down kernel, and some basic libraries.

I don't know what you're talking about with Xen and mosix. Neither seem at all applicable to the sort of software run on big-iron machines like this. NEC SX machines run code written for the SX, and highly tuned for the environment. They run openMP codes on a node, and MPI codes across nodes, or hybrid MPI/OpenMP nodes if you've got really good programmers. Mosix is for thread-migration around a cluster. It's more of a parallel database or web server sort of solution. The SX is a far too expensive machine to use for those sorts of tasks.
Re:Quite possibly. by ChrisLeif · 2007-10-26 03:52 · Score: 1

When we were porting System V to the ETA 10 supercomputer (a short vector machine) one of the hardware engineers came running over one day with the devastating news that the vector square root instruction didn't work on all the test machines. We gravely told him that we would take all of them out of of the Unix kernel.
Re:Quite possibly. by Anpheus · 2007-10-26 06:00 · Score: 1

Hey now, I was about to correct the grandparent and tell him that 99% of all appliances can run Linux (I swear I got my toaster to boot!) but are you saying my golf clubs can too?
Re:Quite possibly. by DerekLyons · 2007-10-26 19:24 · Score: 1

This was certainly the case when I used vector processors. It is possible that the vector processor does not run an OS at all. It has been many years since I have worked on such a beast but when I did we ran a loader system with a standard OS which would cross compile code for the processor and load it almost onto the hardware (there was actually a small program we called a monitor to deal with I/O, etc, but no multi-tasking, security or anything). It would then run and the results were read back into the front-end computer.

What you called a 'monitor' is in fact an OS - bare bones and stripped to the absolute minimum, but an OS none the less.
Re:Quite possibly. by Anonymous Coward · 2007-10-27 00:28 · Score: 0

Nah, that's NetBSD.

(BSD's not dead! It'll live on in toasters all around the world! Muahhahaha!)
Re:Quite possibly. by owndao · 2007-10-28 02:03 · Score: 1

Thank you for your informed reply. It's sad that we have to dig down through all of the "Funny 5" posts to find any facts. As for the PowerPCs that you mention are they similar to the G5 (a PowerPC 970 variant) processors formerly found in Apple's machines? I believe they each had one or two AltiVec SIMD units.

--
Be as you would have the world become.

I can see the ads now by UnixUnix · 2007-10-25 17:10 · Score: 3, Funny

"Easter Island's Weather Forecasting Service believes operation of the NEC SX-9 would realize a 53% savings under Windows Server 2008 compared to under UNIX"

Re:I can see the ads now by GotenXiao · 2007-10-25 17:50 · Score: 1

53% savings, 100% loss in function. I personally don't know of any version of Windows that can run on a vector CPU.

--
Goten Xiao
Re:I can see the ads now by UnixUnix · 2007-10-25 17:59 · Score: 1

Exactly! :-)
Re:I can see the ads now by Bert64 · 2007-10-25 21:51 · Score: 1

Exactly, you could save a lot of money by keeping this machine turned off!

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!

I can see tomorrow's headline... by maciarc · 2007-10-25 17:11 · Score: 2, Funny

"SCO files umpteen bazzillion dollar lawsuit against NEC"

The fastest? by BadAnalogyGuy · 2007-10-25 17:12 · Score: 0, Offtopic

I think you're the fastest, but my dad says you don't work hard enough on integer arithmetic. And he says that lots of times, you don't even parallelize instructions. And that you don't really try...

Re:GFLOPS? TFLOPS? by thedarknite · 2007-10-25 17:13 · Score: 5, Funny

What's with these new fangled measurements.

I'd like to know what it is in Libraries of Congress per Jiffy

--
A game has objectives and is competitive, anything else is just play

Does it play vector games? by filesiteguy · 2007-10-25 17:13 · Score: 4, Funny

I wonder how well it will do with the really cool vector games like Asteroids or BattleZone or Tempest or...

...Star Wars! Yeah, we could take this little baby, setup a sit-down booth, add some speakers and we'd be set!

"what's your vector, Victor?"

--
The Kai's Semi-Updated Website Thingy

1.6 Teraflops? by JK_the_Slacker · 2007-10-25 17:13 · Score: 2, Funny

Why, that's more powerful than a cluster of 60 PS3s! I'll take three!

--
I'm waiting for a "-1 somepeoplejustshouldn'tgetmodprivileges" meta-moderation.

Re:1.6 Teraflops? by RuBLed · 2007-10-25 17:25 · Score: 5, Funny

Ahh... so you're planning to turn Aero on.
Re:1.6 Teraflops? by W33B · 2007-10-26 00:01 · Score: 0

probably cheaper too!

Re:GFLOPS? TFLOPS? by PresidentEnder · 2007-10-25 17:14 · Score: 3, Funny

Your units don't cancel properly. Flops = floating point operations / second, PS3s / foot-second = physical object / (viscosity / weight). You could stretch PS3s to be units of processing power / time, which gives you processing power / time / viscosity, which we'll fudge to be about flops / viscosity.

I dunno: maybe this thing could run faster at higher temperatures in lower gravity?

(/pretending to know what I'm talking about)

--
I used to carry a bottle of whiskey for snake bite. And two snakes. -Nefarious Wheel

Re:Combination Trollmein by mauthbaux · 2007-10-25 17:18 · Score: 1

No, but it just might be in charge of Gundam.

--
"Operating systems suck: you're better off using only the BIOS" --trainsaw.com

Impressive ... by ackthpt · 2007-10-25 17:32 · Score: 2, Funny

with single core speeds of up to 102.4 GFLOPS and up to 1.6TFLOPS on a single node incorporating multiple CPUs.

Don't be too proud of this technological marvel you have created for it is nothing compared to the power of the slashdot effect.

--

A feeling of having made the same mistake before: Deja Foobar

SCO!? by flyingfsck · 2007-10-25 17:40 · Score: 2, Funny

Did they buy a license from SCO?

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!

vector machines in the top500 list refuse to die by paleshadows · 2007-10-25 17:41 · Score: 2, Informative

There's an interesting paper that analyzes the data accumulated in the top500 list site, which ranks the 500 most powerful supercomputers twice a year: it shows that, over time, the share of vector machines within the list is sharply declining, both in aggregated power and in number: from around 60% in 1993 to around 10% in 2003 (see Figure 3, page 6, in said paper). Still, vector machines refuse to die and always seem to maintain a presence in the top500, as is evident from the above slashdot post. Will vector machines live forever?

"up to" by Duncan3 · 2007-10-25 17:46 · Score: 4, Insightful

The only text that can ever follow the words "up to" in computing is "0.1 *". As in "speeds of up to 0.1 * 102.4 GFLOPS". Every time a marketing droid published a press release, a kitten dies.

--
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/

Re:"up to" by kramulous · 2007-10-25 19:19 · Score: 1

I particularly liked:
This new computer features the addition of an arithmetic unit and an increased number of vector pipelines. This has resulted in the development of the fastest single-chip vector processor with a computing performance of 102.4 GFLOPS per single core, and a wide memory bandwidth of 256GB/s. With a single node incorporating up to 16 CPU, computing performance in excess of 1.6TFLOPS is achieved.
So, these are 'core solos'?

--
.
Re:"up to" by darthflo · 2007-10-25 21:49 · Score: 1

With a single node incorporating up to 16 CPU

"up to" in computing is "0.1 *".
Heh, so there's nodes with 1.6 CPUs? How does a three-fifth of a cpu look like? Will they call it Tertium?

Grid Computing vs. Supercomputers? by SamP2 · 2007-10-25 17:47 · Score: 2, Insightful

Hate to burst your bubble, but while grid computing can certainly achieve strong speeds, it is not quite AS fast as you might think.

The entire SETI@HOME project (biggest grid computing project on the net) pumps out 274 teraflops. By comparison, Blue Gene L (first in series) pumps out 360 teraflops, and newer versions will achieve petaflop range, much faster than similar anticipation for grid computing projects.

Sure, you might say, that just like supercomputers evolve, so does grid computing. The problem is that a supercomputer is built for a particular purpose, while grid computing is saturated by all the stuff you can do with it (SETI, protein folding, cancer research, or whatever). Now I'm not saying any of these projects is not totally awesome, nor trying to put down the spirit of the community, but as more and more projects compete with each other for user's CPU, the individual share per project will drop. If you combine all grid computing projects put together with all supercomputers put together, the supercomputers win by a huge margin, and even if every single PC on the world would be hooked to a grid project when not used for its primary purpose, it would still unlikely to beat the sum total for dedicated supercomputer power as far as raw computational capability goes.

And it gets even worse if you account for centralized management of all the computing tasks, coordination, error and fake result checking, and simply the lag of transmitting all the grid packets across the net.

Grid computing projects are a very interesting and useful concept, but they won't ever replace supercomputers. Nor should they. They each are good for their own purpose.

Re:Grid Computing vs. Supercomputers? by Anonymous Coward · 2007-10-25 18:38 · Score: 0

Very interesting readings:
http://en.wikipedia.org/wiki/Seti%40home#Statistics
But in the field of distributed computing the Folding@home project is much speedier:
http://en.wikipedia.org/wiki/Folding%40Home#Participation

The Seti@home is currently running at speed of 292.544 TeraFLOPS:
http://boincstats.com/stats/project_graph.php?pr=sah
The Folding@home is currently running at speed of 1.153 PetaFLOPS:
http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats
Re:Grid Computing vs. Supercomputers? by argiedot · 2007-10-25 19:55 · Score: 1

I don't know much about this, but what does that mean? I mean, doesn't Folding@Home do funky stuff with the Cell processor? Does a single floating point operation on a PS3 the same as a single operation on a general purpose processor?
Re:Grid Computing vs. Supercomputers? by renoX · 2007-10-26 06:36 · Score: 1

Does the figures for the Blue Gene are real or just the maximum possible?

Quite often what you can achieve on a particular problem is much less than what the computer is theoretically capable (say 10%).

The PetaFLOP horizon by Anonymous Coward · 2007-10-25 17:50 · Score: 0

If a single node can go over 1TFLOP, than a 1000 node cluster could actually break the PetaFLOP barrier. How feasible is a cluster of that size?

Re:GFLOPS? TFLOPS? by The+Orange+Mage · 2007-10-25 17:55 · Score: 1

What the heck is LoCs/J measuring...librarian/archiver/etc. efficiency?

Re:vector machines in the top500 list refuse to di by cerberusss · 2007-10-25 18:01 · Score: 2, Insightful

Will vector machines live forever?

Well, I actually doubt it. You could say 'those vector processors are used for matrix calculations and are wildly different from general purpose CPUs' and you'd be right.

However, I could see a point in time where hybrids like the Cell (one scalar processor and eight vector processors) will become so cheap that the number of vector machines will decline even more.

The idea will never die of course, I mean, hardware is so flexible nowadays that a good student could make a vector processor at home, if he had a development board with a fast Xilinx FPGA on it. But I think the decline will continue if hybrids will be used more often.

--
8 of 13 people found this answer helpful. Did you?

one can only imagine... by admactanium · 2007-10-25 18:08 · Score: 0

one can only imagine what a game of 'tail gunner' or tempest would look like on this machine.

Re:GFLOPS? TFLOPS? by sqrt(2) · 2007-10-25 18:26 · Score: 3, Funny

Your units don't cancel properly... Oh no, physics class flashback! No! NOOOOOOOO! I don't want to do this whole equation again!

--
If you build it, nerds will come. Soylentnews.org

Re:GFLOPS? TFLOPS? by thedarknite · 2007-10-25 18:51 · Score: 1

LoCs is a measure of data, jiffy is a measure of time. Therefore LoCs/J is a unit of work cycles much like hertz.

--
A game has objectives and is competitive, anything else is just play

Re:GFLOPS? TFLOPS? by Grendel70 · 2007-10-25 18:52 · Score: 1

Funnily enough - this isn't totally irrelevant.

In 2000, IBM, Toshiba and Sony collaborated to create a Cell processor, consisting of one scalar processor and eight vector processors, for the Sony PlayStation 3. - Wikipedia.org

--
Perhaps you mean a different thing than I do when you say "science."

Re:GFLOPS? TFLOPS? by glittalogik · 2007-10-25 18:56 · Score: 1

I guess a more accurate question would be, how high a stack of PS3s running in parallel would you need to equal this thing's processing power? I so can't be bothered...

careful now: by Nirvelli · 2007-10-25 19:06 · Score: 2, Funny

Don't forget the Storm project!

Re:GFLOPS? TFLOPS? by thedarknite · 2007-10-25 19:11 · Score: 1

In hindsight, I think I should have used P(arallel)-ICBMs as a measurement.

--
A game has objectives and is competitive, anything else is just play

Vector graphics by xx01dk · 2007-10-25 19:11 · Score: 1

Vector graphics are so 1980's... oh wait

--
There is simply too much glass..

This is unix by Mr.+Flibble · 2007-10-25 19:47 · Score: 1

Yes, it runs a UNIX System V-compatible OS

I was reading the description of the system, and thinking I would never be able to operate it as I am such a dinosaur until I saw the above line. My response is: "This is Unix, I know this!"

--
Try to hack my 31337 firewall!

Re:This is unix by Anonymous Coward · 2007-10-25 20:34 · Score: 0

Just hope there aren't any doors.
Re:This is unix by Anonymous Coward · 2007-10-26 01:56 · Score: 0

"This is a UNIX system, I know this!

Yeah but... by fbartho · 2007-10-25 20:16 · Score: 0, Redundant

... Imagine what you could do with a beowulf cluster of these!

--
Gravity Sucks

Re:Yeah but... by KudyardRipling · 2007-10-26 01:55 · Score: 1

Crack NSA encryption for breakfast, Design and 'test' micronukes for lunch, reboot republics for dinner. Control and/or euin every person on the planet for dessert.

[off camera voice] But that would make you the Antichri[silencedgunshot-THuD!bagtagdragdragdrag]

Damn those abrahamic monotheists!

--
Submission as evidence constitutes plaintiff and/or prosecutorial misconduct.

Obligatory karma-whoring first post by daboochmeister · 2007-10-25 20:27 · Score: 1

To keep this to the shortest set of discussion threads ever for ./, let's get right out there with:

Yeah, but does it run Linux?
I, for one, welcome our new complex large-scale computation climate-adjusting, environment-simulating overlord
Imagine a beowulf cluster of these
Useless, they didn't open source the hardware
Finally, a machine that can run Vista
Bet it infringes hundreds of chair-throwing M$ patents
Hmm, did I forget any ...

--
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci

Re:Obligatory karma-whoring first post by daboochmeister · 2007-10-25 20:33 · Score: 1

Dang, took too long to type. Oh well, have to revert back to other less trustworthy and stable things than the ./ karma system as the foundation for my self-esteem.

--
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci
Re:Obligatory karma-whoring first post by darthflo · 2007-10-25 21:53 · Score: 1
Hmm, did I forget any ...
- Does it have hot grits and nekkid Amidalas inside?
- Oog the caveman will beat it up
- In soviet russia, vectors are faster than you (wtf?)
- idk my bff jill
Re:Obligatory karma-whoring first post by rickb928 · 2007-10-26 03:12 · Score: 1

Just don't try to patent the multi-meme-karma-whoring method. I'll claim prior art and embarass your lawyers. No, wait, you really can't embarass a patent lawyer. No, that's not right, you can't embarass any lawyer. Anyways, I'll do something.

--
deleting the extra space after periods so i can stay relevant, yeah.
Re:Obligatory karma-whoring first post by notnAP · 2007-10-26 04:29 · Score: 1

Well, I for one welcome our new multi-meme-karma whores.

Re:vector machines in the top500 list refuse to di by putaro · 2007-10-25 20:34 · Score: 2, Interesting

I haven't looked closely but I would guess (based on having worked at a manufacturer of vector supercomputers many years ago) that all of the machines represented on the Top 500 list are hybrid machines. All of the vector architectures I'm familiar with had a scalar processor to handle most of the housekeeping, run the OS, compilers and things like that. Vector processors aren't very good at doing things like that.

Vector excel at running through essentially loop operations. There's two components to their speed - one is the number of functional units that they have. Conceptually vector operations are applied across an entire array at once (in math speak, arrays are known as "vectors"). Hence they are automatically parallelizable and the more functional units you have the more of the operations can actually be applied in parallel. The other component, though, is their ability to run through data quickly. Since the vector knows that it will be running through a contiguous block of memory they can really get the memory system moving. Scalar processors and their caches are not designed for running in straight lines through data. It's pretty rare to see a cache that will go into a full streaming mode so they are continually starting and stopping the memory subsystem. A vector can issue prefetches for all of its data so you can build an interleaved memory system that will really move the data (we used to have 8-way interleave on our memory subsystem. The scalar didn't do all that well with that but the vector could max out the memory bus in a sustained manner).

What is this, mad mods Friday? by Burb · 2007-10-25 20:49 · Score: 1

Who thought this was Insightful? We need -10 predictable as an option.

--

JP reference? by ebbomega · 2007-10-25 20:50 · Score: 1

You'd better hurry, the raptors are breaking through the doors....

--
Karma: Non-Heinous

No, the real question, vi or emacs? by SmallFurryCreature · 2007-10-25 21:19 · Score: 2, Funny

Come on, we need to know, what is the default editor, vi or emacs? We need to know.

--

MMO Quests are like orgasms:

You may solo them, I prefer them in a group.

Re:No, the real question, vi or emacs? by arfonrg · 2007-10-26 06:53 · Score: 1

What are you a troll??

VI OF COURSE!

--
Your thin skin doesn't make me a troll
Re:No, the real question, vi or emacs? by Anonymous Coward · 2007-10-26 10:10 · Score: 0

Yeah, right screw vi AND emacs. ed forever!
?

Re:vector machines in the top500 list refuse to di by pimpimpim · 2007-10-25 21:28 · Score: 1

The consequence of the parent post is that code can be run very fast on a vector machine IF it is written in such a way to take full advantage of the vector architecture, using smartly written loops. Now it is a good idea to have such loops in your high-performance-computing code anyway, but since not everyone is writing a whole scientific software package from scratch each time a new computer is available, most codes used now are optimized for either shared memory or cluster systems. (It would be nice if they were optimized for hybrid systems, but hybrid MPI isn't without problems, and hybrid systems are relatively new).

Realize that most scientific code probably still has lots of code in it written for the original CRAY system it ran on in the 80's, and you see why vector systems will live on for a while: code that was written for one will have to be used on a vector system. One has to have to luck to find a PhD student willing and able to rewrite the code for a new machine.

I really have the experience that the hardware possibilities are growing faster than the (operating) software using it. Just from a practical point of view: Say a research center gets a new cluster. It takes at least half a year to a year to get it configured correctly, find all the broken hardware and replace it, get network problems out of the way. By then it has already dropped 50 places in the top500 and it will still not be used to even half of its full power. Even though supercomputing is as old as computing itself, there is still no 'plug-n-play' solution to get a supercomputer running. And I am not talking about home-made beowulf clusters here, try with Dell, HP, Sun, or IBM, it just won't work out of the package.

And besides that you end up with the next problem, how to get your program to optimally use the shared memory-core nodes AND the interconnected nodes. These are all non-standard solutions. At the moment you can buy a quad core home machine for about 700 euro, but is there any software written for it? It is time for a smart message-passing/shared memory programming style that will automatically optimize for the hardware it is compiled on.

Until these problems are solved, multi-core computing will be suboptimal, and by just looking at the increase in TFLOPS peak performance of a system we will just be fooling ourselves.

--
molmod.com - computing tips from a molecular modeling

Link to more information by Tom+Womack · 2007-10-25 22:11 · Score: 4, Informative

http://www.nec.de/hpc/hardware/sx-series/index.html

There are four PDFs there; the brochure is a four-colour glossy, but there is some real information. Sadly, the interesting-looking white papers are for the SX6, two generations earlier.

SX9 summary: 65nm technology, 3.2GHz clock speed, eight vector elements handled per cycle with two multiply and two add units, which is where the 102.4Gflop/CPU figure comes from. 16 CPUs in a box about the size of a standard 42U rack.

Totally absurdly fast (ten 64-bit words per cycle per CPU) access to a large (options are 512GB or 1TB) shared main memory; absurdly fast (128GB/second) inter-node bandwidth.

Re:Link to more information by flaming-opus · 2007-10-26 02:54 · Score: 1

NEC keeps plugging along at the cpu's. Those things are incredible. They are, however, VERY VERY vector dependant. They do not run scalar code very fast at all. This many pipe-sets and ALUs per pipe requires very long vector lengths to vectorize well. What I'd love to see is one of these vector CPUs tied very closely to a high-speed scalar CPU like an opteron, xeon, or power6. For a while it sounded like IBM was getting back into the vector game with a power6-derived processor, but that seems to have faded away.

The 16-way SMP also helps. On the older NEC machines, you only ever wanted a reservation on 7 out of 8 CPUs, so the last one could run the OS. Now you can probably grab 15 out of 16 without paying the penalty.

The really exciting thing about this system is the new interconnect. NEC had been limping along with the previous generation of IXS interconnect since the sx-5. 8GB/s is fast, but not when each node has so much horsepower.

The one thing missing from this literature, of course, is any talk of price. For the last few generations, NEC has had the highest per-node performance of any supercomputer, but each node cost so much that other solutions were equally attractive, if not more so. Anyone have some clues as to the pricing?

why must every super computing story... by gravisan · 2007-10-25 22:28 · Score: 1

have obligatory cluster yay retarded replies. Firstly ... I highly doubt this is running linux on the vector part of the core itself, more likely than anything it has a von Neumann machine on the core somewhere or even separately and I will say the obvious here...which I am sure everyone here knows (especialy those people yelling retarded cluster noises) parallelization is as only good as the algorithm (http://en.wikipedia.org/wiki/Amdahl's_law) ... a vector computer is like a bigger version of a GPU with a more generalized pipeline, the clusterability of this would ofcourse depend on not only the algorithm but also the way data is marshelled in and out of the pipeline.

Re:why must every super computing story... by juanfgs · 2007-10-25 23:51 · Score: 1

you must be new here.
Re:why must every super computing story... by NevarMore · 2007-10-26 00:27 · Score: 1

"...a vector computer is like a bigger version of a GPU..."

So instead of using these in a Beowulf cluster we could use them to get a killer (no pun) framerate in Battlefield??

Re:GFLOPS? TFLOPS? by Fred_A · 2007-10-25 22:45 · Score: 3, Funny

Funnily enough - this isn't totally irrelevant.

In 2000, IBM, Toshiba and Sony collaborated to create a Cell processor, consisting of one scalar processor and eight vector processors, for the Sony PlayStation 3. - Wikipedia.org So there should be a Tera-PlayStation (TPS) measurement ? And if so who has the TPS report on that machine ?

--

May contain traces of nut.
Made from the freshest electrons.

SX Series uses SUPER-UX by Anonymous Coward · 2007-10-25 23:00 · Score: 0

The SX series do not run with Linux. Yes the OS is POSIX like but has not as much frills as Linux. The OS does the bare minimum; mind you just want to crush numbers. Now, if you write a FORTRAN program or C you wont have major issue porting it to the SX.

As for porting Linux on a Vector system, well it is not that easy. Totally different memory system is one of the major issues, here fully SMP. Most kernel code is proprietary and I dont see NEC opening it to the public.

Video and Interview with NEC Project Manager by dk3nn3dy · 2007-10-25 23:04 · Score: 2, Informative

There is a video news release and interview with the project manager here: http://movie.diginfo.tv/2007/10/26/07-0502-r.php

Re:vector machines in the top500 list refuse to di by putaro · 2007-10-26 00:06 · Score: 2, Interesting

Realize that most scientific code probably still has lots of code in it written for the original CRAY system it ran on in the 80's, and you see why vector systems will live on for a while: code that was written for one will have to be used on a vector system. One has to have to luck to find a PhD student willing and able to rewrite the code for a new machine. Worse than that even. I was doing this back in the late 80's/early 90's and we spent a large amount of energy getting the FORTRAN compiler to automatically vectorize "dusty deck" (that would be code that was originally written on PUNCHCARDS) scientific code.

Parallel programming is hard. Vectorized code is kind of like parallel light in that it parallelizes very narrow operations without all that messy locking and message passing.

Oh, there was one thing that the vector excelled at that OS's do a lot of - memory copying. When we instrumented our kernel (4.3 BSD derived) we found that it spent an awful lot of time in bcopy. One of the guys spent a fair amount of time implementing a "vcopy" which would use the vector to copy large blocks of memory. On our smoking fast 237 MB/s bus with 8-way interleaved memory the scalar CPU would top out at around 25/30 MB/s due to the interaction of the cache and the memory subsystem. The vector, though, could move at bus speed. Unfortunately, I don't think it ever worked as well in practice as in theory because there was a lot of overhead in getting the vector started, checking to make sure that it wasn't busy doing other work, etc. A dedicated DMA unit would give you the same effect.

Computer by hernyo · 2007-10-26 00:07 · Score: 1

wow... "imagine a Beowulf cluster of those"...

No, not quite... by Anonymous Coward · 2007-10-26 00:18 · Score: 0

If you're spending this much on a machine you're not going to settle for 80% of your application's performance just so you can save a bit of programming time...

NEC are gay by Anonymous Coward · 2007-10-26 00:28 · Score: 0

Japan's HPC industry has gone to shit. The only reason it still exists is because they somehow manage to con the Japanese taxpaying public out of hundreds of millions of dollars each year to fund "research" into "next-generation" supercomputers.

Next-generation my ass. sx-9, sx-10, sx-11... *splud*

fuck you, NEC.

Re:GFLOPS? TFLOPS? by Anonymous Coward · 2007-10-26 00:44 · Score: 0

Do you know about the new cover sheets for them? I'll send another copy of that memo your way.

Re:youn fAil it? by Anonymous Coward · 2007-10-26 01:02 · Score: 0

now suck my cock!

VIRTUALIZATION & SECURITY by Anonymous Coward · 2007-10-26 01:05 · Score: 0

http://it.slashdot.org/article.pl?sid=07/10/25/1521258

That's what is thought of VIRTUALIZATION (VMWare) & security Bert64. This is in regards to our discussion here earlier this year:

http://linux.slashdot.org/comments.pl?sid=320419&cid=20953889

APK

Re:VIRTUALIZATION & SECURITY by Bert64 · 2007-10-26 11:18 · Score: 1

Well, that's as may be...
The point was originally about the CIS test tho, which performs no virtualization-related security checks so it's irrelevant in that context. Maybe it should, but so long as it doesn't the results would not be affected by running in vmware or any other virtualization technology.

Theo is also arguing that the x86 architecture is flawed, and thus any virtualization technology will be flawed when running on x86. I can't say I disagree here, and it would be interesting to see how a more integrated virtualization platform such as z/VM stacks up, or perhaps the partitioning capabilities of the higher end sun servers.

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!

Analog Computers by Anonymous Coward · 2007-10-26 01:10 · Score: 0

I remember that in the Engineering computer department we had an old Analog Computer.
One programmed the amplifiers and feedback loops with patch cables and then sent it
to A/D converters to get the output which then was retrieved thru a data bus
to the digital computer side and manipulated/graphed from there. Mostly you'd set up state-vector and (partial) differential equations on it. I never saw a book on how to program the thing (a PDP Analog computer I think). So I guess these old computers get lost to the dust bins of computer history!

Anybody recall the Analog Computer days. Present day textbooks don't even make mention
that Analog computers existed. So todays computer/engineering students don't know about
them either......

FLOPS are useless by Anonymous Coward · 2007-10-26 01:54 · Score: 2, Funny

How many BogoMips does it have? =)

Re:GFLOPS? TFLOPS? by Sponge+Bath · 2007-10-26 01:55 · Score: 1

I'm not sure about that, but I hear it can make the Kessel run in 12 parsecs.

Unicos and Cray by wandazulu · 2007-10-26 01:57 · Score: 2, Interesting

Whenever I hear "supercomputer" and Unix I think of using a Cray and Unicos, which was the version of Unix that ran on them. Unicos was, at least the version I used, the ultimate in bare-bones Unix. I think when people think of Unix today they think of something like Linux or the BSDs or OS X, or whatever where the environment is very rich with tools. Unix on a supercomputer is not much more than an interface between your C (or Fortran) program and the bare metal; they don't (again, in my experience) make it the kind of environment you *use*...you get your code on the machine, compile it, submit it, and log off and wait for an email.

Maybe this NEC machine is different but Unix on a supercomputer is like the cockpit of a Forumula 1 race car; just there to provide a way to steer, comforts be damned.

Re:Unicos and Cray by flaming-opus · 2007-10-26 03:16 · Score: 1

Supercomputer OSes, like all unix OSes, have gained functionality over the years. In the supercomputing world, data storage and I/O performance are almost as important as the computational job. Thus a lot of attention is paid to filesystems.
super/UX is pretty stripped down, but getting better. Cray Unicos is no longer based on that system-V stuff and is either based on Irix on the X1, or is linux-derived on the XT4. The compute nodes are pretty stripped down, but the loggin nodes are pretty much off the shelf linux. I think it's a logical direction for supercomputer makers.
Re:Unicos and Cray by cthulhu11 · 2007-10-27 02:51 · Score: 1

The completeness of the *ix environment (real X11 port, vectorizing ANSI C compiler, etc) was one thing that drew customers to Convex vector machines back in the day.

Actually, by Anonymous Coward · 2007-10-26 02:36 · Score: 0

100% of the PCs CAN run linux. More than 99% of the comptuers in the world CAN run linux.

how high a stack of PS3s by foobsr · 2007-10-26 02:36 · Score: 1

~10000 would be a good guess.

Quote: "Mueller, an associate professor of computer science, has built a supercomputing cluster capable of both high-performance computing and running the latest in computer gaming. His cluster of eight PS3 machines - the first such academic cluster in the world - packs the power of a small supercomputer, but at a total cost of about $5,000, it costs less than some desktop computers that have only a fraction of the computing power.
...
Mueller estimates that with approximately 10,000 PS3 machines anyone could create the fastest computer in the world - albeit with limited single-precision capabilities and networking constraints."

CC.

--
TaijiQuan (Huang, 5 loosenings)

Re:vector machines in the top500 list refuse to di by Rhys · 2007-10-26 02:36 · Score: 1

What do you think your Pentium's MMX instructions are? They're vector operations. Every machine on the list is already a hybrid between the two. They aren't dedicated individual vector processors under the command of a master GP-CPU, but a different version of hybridization.

I'd actually suggest that you'll probably see vector processors marginalized or pushed out eventually by stream processors: aka nvidia/ati graphics boards.

--
Slashdot Patriotism: We Support our Dupes!

Re:vector machines in the top500 list refuse to di by flaming-opus · 2007-10-26 03:29 · Score: 2, Informative

So here's what you're missing: Vector processors aren't about doing a lot of math. True, they do that very well, but that's not where they excell. Where vector processors really shine, is in memory bandwidth. Vector operations let you use that 4Terabyte/second of memory bandwidth, and actually use it, not spend it all flushing out cache lines. On this machine, a single load instruction can fetch 2KB of data.

Cell (and many GPUs or future whatever) have the ability to do a LOT of math, but they do it on a very tiny amount of data. These vector CPUs have dozens or hundreds of memory controllers. That's a lot of RAM chips, and a lot of copper wires between memory and the CPU. I'm sure the motherboard is dozens of layers thick for all the traces. In short, you can't get all that capability on a commodity processor, because the commodity market won't pay for all the memory bandwidth, which is expensive to engineer, and expensive per-unit.

Unless/untill there is a major change in the cost of memory, and memory bandwidth, there will still be a need for special-purpose supercomputing processors. This is not to say that Cray and NEC will continue to be the people to make such a thing. I'm sure IBM could come up with a cell-derived processor with a TON of real memory bandwidth, or maybe Nvidia. The question is: will they want to? I figure there's a lot more money to be made selling videogame consoles than there is at the high-end of the supercomputer market.

Not For Long by Doc+Ruby · 2007-10-26 04:23 · Score: 1

he SX-9 closes in on the PFLOPS (one quadrillion floating point operations per second) range by achieving a processing performance of 839 TFLOPS.

Pretty fast, but IBM will release its Roadrunner at Los Alamos NL next year with 1.6PFLOPS:

The computer is designed for a performance level of 1.6 petaflops peak and to be the world's first TOP500 Linpack sustained 1.0 petaflops system. [...] It will be a hybrid design with more than 16,000 AMD Opteron cores (~2200 IBM x3755 4U servers, each holding four dual core Opterons, connected by Infiniband) to an equal number of Cell microprocessors resulting in a 1:1 ratio of Cell cores to Opteron cores.

16000 CellBE chips have a capacity of 725GFLOPS each for 11.6PFLOPS. So it's expected to operate at only about 14% of HW capacity. There's a lot of room for that Roadrunner, or a machine like it, to stay at the top of the heap.

--

--
make install -not war

Re:Not For Long by Anonymous Coward · 2007-10-26 06:19 · Score: 0

Pfft. Infiniband...

We are comparing a dedicated vector supercomputer (SX-9) to a supercluster (Roadrunner). They'll shine in different tasks. As always, the trick is in the interconnect. (Blue Gene is somewhere in the middle -- the interconnect is a really good multi-dimensional grid with dedicated I/O processors handling messages. But still it's not 128 GB/s node-to-node like the SX-9.)

Are there any HPC guys who really give a toss about Linpack?
Re:Not For Long by flaming-opus · 2007-10-26 07:43 · Score: 1

infinaband is not bad for a single processor node. The trouble is that everyone wants to use a single port to power 16 cores.

Sadly, linpack does play a role in the computers that are purchased. Sometimes the guys making the decision are not the engineers and programmers, who then have to suffer with the consequences. Alas.
Re:Not For Long by flaming-opus · 2007-10-26 07:57 · Score: 1

There are several petaflop machines in their initial phases of roll-out right now, but peak performance isn't the only number worth paying attention to. The SX-9 is an amazing architecture, with orders of magnitude more bandwidth than the roadrunner system, both in interconnect, and in memory bandwidth. It's also a very expensive machine. The SX, however, has an advantage of being an update and refinement to a very established architecture. codes written for the SX-3 are going to perform well on the sx-9.

Roadrunner is a cool machine, but hard to program. You have most of your compute capability on the cell coprocessors, which are connected by a non-coherent infinaband pipe, to the host opteron processor. On the cell, you have a powerpc doing the program setup, and the 8 spu engines chunking through the data in parallel. If you can parallelize your program enough, you can get those working really quickly, but you have to do all the memory management in software, and control flow has to happen on the ppc. It's not impossible to get good performance from such a machine, but don't expect to just drop your existing software onto it and get decent performance. Blue gene is a less radical approach, and it took 3-4 years for a dozen codes to be rewritten to take advantage of it. Los Alamos runs a small number of applications, but needs a lot of performance. Thus they can spend the time to optimise their small list of apps for such a heirarchical design.

Also, you have your number off a little. The updated cell will provide 100Gflops of double precision performance, according to IBM. Still very fast. http://www.cs.utk.edu/~dongarra/cell2006/cell-slides/04-Ken-Koch.pdf
http://repositories.cdlib.org/cgi/viewcontent.cgi?article=4262&context=lbnl
Re:Not For Long by Anonymous Coward · 2007-10-27 00:57 · Score: 0

I'm the AC above with the Infiniband comment. Thank you for yet another and illuminating post on these mesmerizing machines so few have access to :-)

By the way, you'd be a very welcome visitor to Beyond3D.com's Cell Performance subforum -- hope you visit and find like company. Myself, I tend to just lurk there, not wanting to nudge down the excellent S/N ratio ;-P

Re:GFLOPS? TFLOPS? by Anonymous Coward · 2007-10-26 05:56 · Score: 0

Only if it shoots first.

Re:vector machines in the top500 list refuse to di by Anonymous Coward · 2007-10-26 06:05 · Score: 0

Yes, vector machines will "forever" be the most suitable tool for vector computing.

Mind you, while I love Top 500 and the funky gear on the list, it's a very limited benchmark. Linpack, I don't remember what else is there (and can't be arsed to check now). None of the stuff these babies with their jaw-dropping interconnect architectures excel in. It doesn't really penalize clusters (with their flimsy Infiniband or Myrinet or 10GbE) like some real-world simulation jobs would. Admitted, Blue Gene deserves to be there, if for the said wrong reasons.

Re:vector machines in the top500 list refuse to di by specific_pacific · 2007-10-26 06:15 · Score: 1

Well put. I've never really had that top down view before.

Nothing special, Roadrunner is faster by Anonymous Coward · 2007-10-28 07:01 · Score: 0

IBM and Los Alamos National Laboratory announced something faster months ago. Roadrunner is 1.3 petaFlops peak vs 0.839 petaFlops peak for the NEC. See http://www.lanl.gov/roadrunner/ for many details on the Roadrunner system design.

Slashdot Mirror

NEC SX-9 to be World's Fastest Vector Computer

137 comments