Super Computing 2000
Stephen Adler of Brookhaven Laboratory has written a fine account of the Super Computing 2000 conference in Dallas, Texas. He covers super computing, venture capital, some fascinating info about SETI, open source software, and even has some geek porn.
Big Iron sure isn't as big as it was in the Good Olde Days(tm). Why, sonny, when I went to college, the CDC 3150, with 32 k words of memory took up a whole room. They had to kick out some walls to add another 32k. And it was actually made out of iron, not a bunch of silicon and plastic.
Supercomputing is the top 1-5% of performance and price at any given time. Right now that would be at least a 100 Gigaflops, being that there a handful of single-digit teraflop systems out there. And it is whatever you can buy for several million dollars, all the way out to the top end of a $100 million for an ASCI prototype. For a period of time supercomputing also meant compromised software- clunky compilers and limited operating systems, but greatly improved these days. Supercomputing hardware technology has been all over the map the past 30 years- from highly optimized single CPUS, to vector machines, and massively parallel machines. But the underlying criterion is performance.
I was just saying that US is not doing vector computing in any big way anymore. And I think that my statement is being backed by the fact that none of the 500 fastest supercomputers in the world is an american produced vector computer. The vector machines rank high on the Top500 and are well represented (considering them being extremely non-commodity hardware), but not one single vector machine present on Top500 is produced by Cray or any other american company.
The fun part is, that Cray has been moving in the direction of ``kludging together 4000 4-way SMP Linux boxes with 100BaseT and some duct tape'' as you so elegantly put it. I know, they're not building beowulf clusters yet, but they are using Alpha processors (instead of Cray Vector processors) in all of their machines that are actually worth mentioning (T3E etc). It is a big step in the commodity-hardware (more cheaper cpus in favour of fewer expensive ones) direction.
Yes, it will be interesting to see what happens next. I do not understand why american companies have been moving away from the vector market - any idea ? At least IBM should have the resources to produce a vector CPU. I think it's Hitachi that's licensing the Power3 core, but they're using it in a heavily modified Power3-like CPU with vector registers. Kind of interesting...
Oh, and about the clusters: The ASCI project is funding the development of hardware and software for very high speed computing. It is an american initiative (so american-only vendors, meaning no vector machines because Cray cannot deliver anything remotely reasonable in the high end) with the goal of producing computing systems powerful enough to simulate nuclear weapon tests (in order to eliminate or reduce the need for actual live testing). Simulating that kind of physics is a very real-world problem (not your average distributed.net/seti@home embarassingly parallel problem). However, the four fastest supercomputers in the world (ASCI White, ASCI Red, ASCI Blue_Pacific and ASCI Blue Mountain) are clusters. Yes, it's not 100BaseT, but the machines are certainly not SMP/ccNUMA either. You will notice that the 4.9TFlops/sec for #1 is much less than the 12.3TFlops/sec which is the theoretical peak for that machine. The Hitachi and the Crays are much closer to reaching their theoretical peak, of course, because as you pointed out their architectures are so different from the clusters. But the clusters are real nonetheless.
The main thing I wanted to state in the very beginning in all of this was something like:
1) Vector computing gets very little attention in the US. (I couldn't find a single vector machine in the top100 which was located in the US)
2) I find it amusing that japanese vector machines are not in use in the US, because of politics
3) I'm wondering why no US company is building vector machines that are used for *high*end* computing (meaning, it will at least appear at top500, which no Cray Vector does)
I don't want to bash Cray and I'm not on Hitachi's payroll
Let me know what you think about 1, 2, and 3
The final machine Seymour Cray worked on before his passing in 1996 debuts at this conference. See the <A HREF="http://www.srccomp.com/"> SRC website </A> for a description of this machine and a biography of the late founder. The SRC-6 deviates from the Cray-1,2,3 & 4 philosophy of very high performance small number CPUs and uses commodity CPUs.
In a vector CPU, you have vector registers. For example, one single multiply-add instruction can take one register multiply it with another and add a third, but each register may hold perhaps 16 scalars instead of just one.
So on a 1GHz ``ordinary'' cpu you do
multiply a, b
and get 1GFlop/sec because you
can run ~1 instruction per second.
On a 300MHz vector machine you would do
multiply a, b
and get 4.8GFlops/sec, because each register
holds 16 scalars instead of one.
The usual characteristics of vector machines are:
Clock speed is low
The work-per-cycle that can be done is high
Memory bandwidth and access time is extremely good
They're expensive, but for some problems the price/performance is better than what you can get from a non-vector machine
Some vector machines have no cache - or, put in another way, all their memory is the cache. Access times are constant no matter how you access your memory. You will pay for this with a very long pipeline which in turn means that the CPU is virtually unusable for some problems. On the other hand, the cache-based CPUs are virtually unusable for other problems. It's a compromise.
To sum up my aborted post, Cray has been evolving from the single processor Cray-1, to the Multi-processor Cray YMP, to the massively distributed T3E. Seymour and Cray Computer Corp. (spun off of CRI in the late 80's or so) failed because they couldn't push as much performance through a smaller number of processors. Eventually the physical laws of silicon (Seymour even tried GaAs to get more performance) take over, and you must expand the number of processing units to get greater and greater performance.
The T3E is a 3D toroidal-constructed system. SGI's Origin uses the Hypercube. Sun uses whatever Cray's Business Unit did back in the day before SGI sold the Starfire, renamed the Sun UltraEnterprise 10000, to Sun (SMP I guess). The model works. That's not disputed.
The High-End market that Cray and Hitachi serves is fairly stagnant (growing slightly more than inflation) at around 1 Billion USD/year (IIRC). It doesn't grow 40% per year like standard PCs, handhelds, or the streaming video & porn market. The pie is only so big, so IBM and Sun choose the bigger market; they're exploiting the internet. ASCI projects don't make much money. They're done for the press they receive. I've heard of companies exploiting Cray's extreme I/O bandwidth for file archivers to tape robots, but that's about it for general purpose. You wouldn't buy an SV1 or a Hitachi to run Apache, that's for sure.
As Durinia pointed out elsewhere in this discussion, Ford and other auto companies still use Cray Vector machines, as well as other research labs, etc. Vector use isn't dead in the US. It's just not the centerpiece, I guess.
Ok, I was actually now aware that Cray (that was sold to SGI, and then sold to some other company who's now ditched their old name and are using the Cray name for it's brand name value) were still doing vector machines.
;)
:)
They mainly build Alpha processor based machines these days. And if you look at their vector machine, you'll notice that they brag about 1.8GFlops/(sec*cpu). For a vector CPU, that would have been fast five years ago. Look at the other numbers as well... Scales to 1.8TFlops/sec versus 4TFlops/sec for Hitachi, which would do so on much fewer CPUs, which again would mean that you would usually be able to get much closer to the theoretical peak on the Hitachi. 40GBytes/sec memory bandwidth, versus 40TBytes/sec...
A 1 GHz Athlon can sustain a little more than one GFlops/sec on a matrix multiply - compare that to your 1.8GFlops/sec ``vector'' CPU... Oh, the Athlons are made in Dresden (Germany)
For real evidence, rather than marketing numbers, see the Top500:
1) ASCI White - 8192 Power3 CPUs (IBM)
2) ASCI Red - 9632 Intel CPUs (Intel)
3) ASCI Blue Pacific - 5808 604e CPUs (IBM)
4) ASCI Blue Mountain - 6144 MIPS CPUs (SGI)
5) ?? 1336 Power3 CPUs (IBM)
6) ?? 1104 Power3 CPUs (IBM)
7) SR8000-F1/112 112 Hitachi CPUs
...
10) T3E1200 1084 Alpha CPUs (Cray)
Notice that the Hitachi which is somewhat faster than the Cray has one order of magnitude *fewer* CPUs than the Cray. And the fastest Cray in use today is *NOT* a vector.
See www.top500.org for more info.
I give you, that Vector computing is still alive in the US. I am not going to agree that it is ``well'' too.
But thanks for the information
2) I find it amusing as well, actually. They (the gov't) do this for two reasons - they don't really "trust" overseas SC vendors to put SC in government sites (i.e. they don't want to call a japanese service guy to fix a SC at CIA HQ or Army labs, etc). Also, they want to have some say in the design of the machines - because they help fund the R&D for Cray, they can be real up front about what kind of machine they need, even before it's designed.
3) I think I pointed this out before - the top500 list is not an accurate measure of performance, by any means. I don't know how the application performance of the Hitachi and the Cray might line up, but the reason the Vector machines don't show up very well is that all of these other shared mem/superclustered-type machines are overrated. Companies are still buying Crays. Why? They're not stupid. (well, not all of them!) I'm guessing Ford bought 5 (I checked the numbers :) because their applications just ran fastest on that machine. I think an SV1 set a record recently for NASTRAN.
> These guys may mean business, but I'm glad that they aren't doing business with my money. Just because someone is a top notch
> physicist, for instance, doesn't mean that they know anything about good software or hardware design.
I'm glad to see someone else reflected on this passage in Steven Adler's otherwise very cool review of this geekfest. (& I wish I had made the effort to have gone to last year's conference here in Portland.)
Adler describes the situation very well: take one guy who wants to play with big toys, works long hours for less-than-market-scale pay in exchange for the priviledge, & knows he is close to burn-out, hold before him the temptation to sell it all for a chance to grab the brass ring . . . & watch him catch the start-up bug. This is what all of those VC suits are hoping for.
This -- & whether all of the stars in the geek's eyes will blind him to the fact he's signing away most of his idea for a few million dollars. That will melt away during the start-up phase.
Geoff
I think I see a trend here. Maybe for them it really would be easier to muzzle the entire internet than to produce p
And that bird is the University of Delaware bird. I don't think those racks are going to Duke. That is YoUDee the Fightin Blue Hen Regards, Andrew
It's just mutating...
Though there is still a place in the world, in my mind, for mid-to-large SIMD systems (SGI/Crays, Starfires, RS/6000s, etc.), this conference and other events are showing that cluster supercomputing and widely distributed computing (a'la SETI@Home, Distributed.Net, WebWorld, etc.) are also being taken seriously.
What drives any advances in hardware is applications that can take advantage of them. Supercomputing is not dead because on top of the usual uses (fluid dynamics modeling, codebreaking, etc.), the Internet and new algorithms in IR and AI are combining to compel people to want to approach Information Retrieval and understanding problems which are at the level of requiring supercomputing resources.
The company I work for (www.webmind.com) is building a hybrid AI system for IR and understanding (among other things) which is optimized to run not on a vector supercomputer but on a cluster of independent servers (though a nice MIMD supercomputer would be nice). With commodity hardware we have managed to get over 100GB of RAM and 200GHz of CPU for less than $500,000 for our first prototype of a large installation. A 6-64way IBM RS/6000 starts at $420,000.
Missing from the write-up (and possibly from the conference) are the folks at Starbridge Systems who are working on a "Hypercomputer" which has a field-reprogrammable topology. If any supercomputer company that needs it deserves venture capital funding, it's this one. The idea of allowing an application to change the network topology of the processors to optimize for its own data representation is extremely powerful - it's the next "big thing" in both MIMD supercomputing and networks for cluster supercomputing, IMNSHO.
SETI@Home is a very cool project, but it would be nice to have more about applications driving supercomputing at such conferences. Distributed.Net seems to get too little time since their client isn't as pretty as SETI, and also cluster supercomputing not over the Internet is being used for all sorts of cool stuff, and it would be especially interesting to hear more results about commodity clusters vs. proprietary large systems in areas like fluid dynamics modeling where the proprietary systems usually rule.
Also, more of a focus on evolutionary computing, FPGAs in supercomputer design, and information retrieval and understanding applications would be nice.
But overall, a good writeup of what looks like it was a pretty interesting and refreshingly diverse (not just "big iron" focused) SC conference...
o/~ we are pissed, we are pissed, we have to resist... o/~ - ec8or
geek porn:
This site has been blocked by your administrative team
Content: Sex, pornography
If you have a business need to view this material, please contact the Information Technology Manager.
It may look like I'm doing nothing, but I'm actively waiting for my problems to go away.
--Scott Adams
Why do you think Fujitsu puts 16 GigaBytes of memory INSIDE the processor module ??
Top vector machines in the world: (from www.top500.org again)
9) Hitachi - 917GFlops/sec - In Japan
12) Fujitsu - 886GFlops/sec - In the UK
13) Hitachi - 873GFlops/sec - In Japan
18) Hitachi - 691.3GFlops/sec - In Japan
24) Hitachi - 577GFlops/sec - In Japan
33) Fujitsu - 492GFlops/sec - In Japan
35) Fujitsu - 482GFlops/sec - In Japan
37) Hitachi - 449GFlops/sec - In Japan
59) Fujitsu - 319GFlops/sec - In Japan
63) Fujitsu - 296.1GFlops/sec - In Japan
65) Fujitsu - 286GFlops/sec - In France
67) NEC - 280GFlops/sec - In France
76) NEC - 244GFlops/sec - In Japan
77) NEC - 243GFlops/sec - In Australia
78) NEC - 243GFlops/sec - In Canada
79) NEC - 243GFlops/sec - In Japan
...
I found lots of crays, but all T3E, meaning Alpha based. In the first 100 entries, I could not find one single vector machine located in the US.
Please point me to it if you can find it.
As a foot note on this page:
:)
www.top500.org/lists/2000/11/trends.html
you will see that:
Of the 500 fastest supercomputers in the world today:
All vector based systems are of Japanese origin.
Even I had not expected that
Hey, there were no images of the naked vector CPUs presented by Fujitsu, or the tweaked power-cpus from NEC or Hitachi...
:)
One CPU:
256 registers
each register holds 64 double precision floats
330MHz
32 FLOPS per clock-cycle
One CPU will yield 9.something GFLOPS. It has 16GB memory in the processor module. And they sell MP machines from a few GFLOPS to 4TFLOPS. Both Fujitsu and Hitachi had processor modules stripped down - they were *sweet*!
So, the 760 chipset has a 200MHz FSB ? Well, Hitachi has 40 TeraBytes/second memory bandwidth in the local machines, and some ~8GigaBytes/second between machines.
Friends, vector computing is not dead - unfortunately only Japan produces vector machines, so only asia and europe can use them. US national laboratories are not allowed to buy them, not because of export regulations, but because of import regulations... Go figure.
So, basically, Linpack looks at raw performance if the processors/vector pipes aren't starved. Cray's non-vector box, the T3E (again, 1994) has the highest sustained real-world benchmarks, just over 1 TFLOPS/s. Don't preach theoretical. That's for academics. Most supercomputer applications don't consist of small loops of code that runs everything out of registers. You're kidding yourself if you think they do.
Ahhhh the joys of core ram ... it was a shame to see mercury delay lines retired :)
Free Techno/Jazz/DNB/MI Music by guys obsessed with monkeys!
Look at the racks on that baby going to Duke!
:)
Sorry... Couldn't help it...
I like you, Stuart. You're not like everyone else, here, at Slashdot.
Let me preface this by saying that I like Linux, I run it on all my desktop machines, and have admin'ed a cluster of 300 Linux boxen and loved it. However ...
I was not actually at SC this year, but have attended the two previous ones and have actually had a chance to speak with Todd Needham and other people from MS Research. The legal and marketing department there may be full of idiots, but never think that if someone is an MS employee they're automatically and idiot; they may have misplaced loyalties, but they are not stupid. I have to admit that Todd Needham from MS had one or two good points in the panel discussion. Open Source is not the magic "pixie dust" that automatically fixes everything (*snicker* Todd quoted jwz). As someone who's worked in a business environment, I can say with confidence that some level of heirarchical organization is necessary. The Cathedral/Bazaar analogy may be good, but it is not a black-or-white issue; projects can fall somewhere in-between, and those that strike the right balance will be successful.
With respect to security, many eyes may make all bugs shallow, provided some of those eyes are willing to partake in a formal security analysis. A compiler researcher as OSDI this year revealed that in the Linux kernel there are ~80 places where interrupts are disabled and never reenabled. How did he discover this? By modifying a compiler to be a little "smarter" and read a formal specification tailored to the application you're compiling, thus performing more rigorous static anaylsis on your code. It's pretty interesting reading. Many eyes are not a substitute for formal analysis, but something to augment it; conversely, formal analysis of software is still a young field, and many eyes are still a necessity for improving robustness in large projects.
The bottom line: seek the middle ground. Just because "their" ways doesn't work doesn't mean everything was wrong with it. And yes, I'm ready to be moderated down as "flamebait". :)
-jdm
There aren't any scantily clad porn stars in these pics, it was a joke!!! :)
Half of it is dead.
Supercomputer applications can be ordered by the inherent parallelism of the underlying problem. Some problems are inherent parallel (e.g. SETI, HEP, etc.) some require communication (Weather, NP, etc.). Wherever communication is needed old fasioned "supercomputers" e.g. shared mem SIMD, MIMD, ccNUMA etc. mashines will be superior to clusters, i.e. distributed mem MIMD mashines.
KdenLive/PIAVE - non-linear video editing
Last year, I wrote in about supercomputing in Portland, and was surprised because no one had mentioned it. A lot of cool things go on there. Big Iron in the extreme. Last year, the LAN was wavelength multiplex 10x 192 Mbps fiber. Lots and Lots of big computers large portions of which run [f|F]ree unices. The people at supercomputing are often "the cutting edge". Like for example, the IBM monitor that was mentioned on slashdot the other day. I brought that up a year ago, having seen it at supercomputing. It really deserves a closer look as a whole.
-- Who is the bigger fool? The fool or the fool who follows him? --
I can just see it now, sadist troll cults that enjoy taking brutal beatings, bitter kung-fu flame wars and noble /. vigilante enforcers!
Imagine being a newbie in that type of system? "Why is everyone punching me?"
UBU