Supercomputers To Move To Specialization?
lucasw writes "The Japan Earth Simulator outperformed a computer at Los Alamos (previously the world's fastest) by a factor of three while using fewer, more specialized processors and advanced interconnect technology. This spawned multiple government reports that many suspected would ask for more funding in the U.S. for custom supercomputer architectures and less emphasis on clustering commodity hardware. One report released yesterday suggests a balanced approach."
Ignoring size, how does the cost of a cluster of fewer, highly specialized computers (with special interconnects, etc.) compare with that of a cluster of more, less specialized computers?
Teraflops per dollar is important, let's not forget that.
How does one go about bench marking a super computer specialized to do a certain task versus cheap computers in a cluster. Now we need to spend more money to develop specialized super computers even though the case scenerio presented in japan might not hold true to other applications? Seems a little too soon to start making recommendations
Skynet had 60 Teraflops IIRC and they're talking about 100!
Let's hope this isn't tied into Nukes somehow. Wait a sec, a massive virus has already spread disabling millions of computers!
RUN HIDE! THE END IS UPON US!!!!!!!
----
Go canucks, habs, and sens!
The Japan Earth Simulator outperformed a computer at Los Alamos (previously the world's fastest) by a factor of three while using fewer, more specialized processors...
What is the difference between processor designed to simulate earthquakes (et al) and an ordinary, off-the-shelf processor? I mean - so they optomized floating point operations. Is that it?
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
As long as the Japanese can't run Quake3 at a higher FPS than we can!
Abaddon: An Xbox 360 Indie game
What if you care only about integer operations?
You tell me how "whilst" differs from "while," and I'll stop calling you a pretentious jackass.
So does this mean that supercomputers will be developed without following current hardware standards or is there a new standard being formed for supercomputers? So I guess my Athlon XP won't fit in their CPU socket will it? Damn... So much for cheap AMD CPUs for supercomputers.
:P
Makes you wonder if Japan has already developed a nice powerful 128bit supercomputer to dish out to crush any competition.
This space is not for rent.
Their computer had Tallow use a PsyBeam attack, which is totally whacked.
Roving Web-Teleoperated Robot
Last thing I need is a supercomputer that checks email at a high speed, and a separate computer that does wordprocessing even faster.
/. all in one box.
I am one of those people with maxed out PCI cards, maxed out SCSI buses and everything jammed into one PC.
Work, pr0n, email,
If you're going to have a supercomputer do one thing, of course specialize it. An Earth simulation surely has a set number of formulae whose calculations are to be optimized as much as possible, even to the hardware level.
But if you want a versitile, general-purpose supercomputer, why not go with the clustering solution?
The two studies resulted, in part, from NEC Corp.'s May 2002 announcement of the Earth Simulator, a custom-built supercomputer that delivers 35.8 teraflops. That system packed five times the performance of the fastest U.S. supercomputer at that time...
"The Earth Simulator created a tremendous amount of interest in high-performance computing and was a sign the U.S. may have been slipping behind what others were doing," said Jack Dongarra...
Graham said researchers should not overreact to NEC Corp.'s Earth Simulator that blindsided many in the high-performance computing community eighteen months ago by delivering a custom-built system five to seven times more powerful than the more off-the-shelf clusters developed in the U.S.
I don't mean to draw a crude analogy here, but I really can't help but read this and be reminded of the space race.
It took Sputnik to kickstart our spacemindedness; I for one consider it sad that a "tremendous amount of interest" -- and the funding that comes with it -- in high-performance computing seems only to have arisen/regenerated with the influence of competitive international politics. Are we really so hardly advanced that our respective national egos are still the driving force behind enthusiasm, financial or otherwise, in certain areas of science?
The coolest voice ever.
Is there a way to really compare the speed of a supercomputer and commodity hardware? If anyone could give either a quick explanation or a link to the relationships between bogomips teraflops MHz and the whole lot I would be very much appreciative.
-Silmarildur
Specialized hardware (almost) always outperforms commodity stuff.
I use custom designed amplifiers because they work better for my application. I could buy off-the-shelf stuff (~$500~$10,000 range), but that won't be exactly what I want. I use custom software too... know why? Because it's designed specifically for the job. That same software shouldn't really be used for other fields of research, neither should my amplifiers. The thing about this stuff is that it takes a lot of time to maintain (plus initial development). That means grad students, postdocs, and technicians who may spend over 90% of their time just keeping systems in working order and/or adding features. The benefits of customized hardware/software, in this instance, is worth the headaches associated with it.
All of my optics is commodity stuff (some is rare/exotic, but it's still basically black-box purchasing). I don't have the facilities to make coated optics, nor do I need anything that specialized, so... I just buy it.
When I was in telecom, we used Oracle and Solaris and Apache. It worked, and the cost of developing the same functionality in-house was ridiculously high (plus we'd never get to designing our products that sit on top of it).
Eventually, it always comes down to a comparison between the cost (man hours, equipment, etc) of custom building and of integrating stuff from OEMs.
So, the question our labs need to answer is, does clustered COTS hardware get the job done? Supplementary to that, is it cost-effective to buy/design it in light of the previous answer?
In any field where you are pushing the limits of technology, you have to make such trade-offs. Personally, I don't care who has the absolute fastest supercomputer (measured in flops, factoring-time, whatever)... what really counts is, who does the best research with the supercomputers.
Down with Saudi Arabia!!!
So you want more specialized supercomputer, eh? I can build one right now!
It's specialty is executing x86 programs. I can also make some that specialize in PowerPC programs.
HA! Gotcha!
Specialized systems are almost always going to outperform generalized systems when you're dealing with similar levels of technology (for instance, specialized abacasuses vs. a generalized Cray T3E).
;))
The great thing about generalized systems is you can use them to explore new areas, then design a specialized system to take advantage of specific optimizations the generalized one can't support.
I'm glad for the report suggesting a "balanced approach". I can't imagine forsaking one type of system for the other, as each has its place. (Uhoh... generalized systems have a "place"? Does that mean they're specialized at being generalized? Oh, the irony!
bytesmythe
Hypocrisy is the resin that holds the plywood of society together.
-- Scott Meyer
Hello.
...
Custom Software running on Custom Hardware vs. Custom Software running on Commodity Hardware.
Duh
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
The reports come on the heels of recent congressional testimony warning that the United States is falling behind in supercomputing.
Since when is the US falling behind in supercomputing. I remember reading a list of top supercomputers in the world, and the US had 14 of the top 20. Isn't it quantity in this case, not quality? Specialization is just the case here, so what if we don't have the absolute fastest.
--"The perfect example of the man of action is the suicide." - William Carlos Williams
I assume that hard-coding trig functions into the tertiary processors would be advantagious for this. I know it violates the spirit of RISC in general-pupose computing, but for such a large scale system with so many processors it coould be advantagious.
Do HP's Saturn or other such special-purpose processors have hard-coded higher-level functions?
You can't judge a book by the way it wears its hair.
Cray is back and getting back into the government contract game. Suprisingly, they are doing it just as the DOD is realizing that they need specialized hardware like they used to when Cray was one of their best suppliers. Look for little ol Cray to be back in the black real quick, and pick up a few shares now.
"Curiosity killed the cat, but for a while I was a suspect."- Steven Wright
Captain: What !
What if you care only about integer operations?
Then I'd cluster a planetload of Apple II's running Integer BASIC.
The coolest voice ever.
This is kind of a compromise between each node being a slow but adaptable general purpose CPU (with maximum flexibility) and a super fast (but inflexible) ASIC.
Perhaps the big barrier to this would be making the math and physics geeks write verilog, or perhaps writing a really shit-hot fortran->verilog converter.
So, I figure that either a) smarter people have already done this or b) it's really stupid.
(note that, in this case at least, I'm not really talking about reconfigurable computing)
## W.Finlay McWalter ## http://www.mcwalter.org ##
Good question there man.
I am also wondering, which should I get? I mean, with Doom III on its way, to get decent frames should I go specialized supercomputer, or a linux beowulf cluster?
Captain: It's You !!
Great argument for people with their head in the trough. We need funding for specialized, proprietary hardware so we don't fall behind the Japanese. Intel/AMD CPU's aren't good enough. SUN, can't compete price/performance with Linux/Intel. NASA lost a couple of Mars probes (expensive, custom hardware), while a cheap Mars Rover mission makes it there with OTS parts. Of course, if you are aiming for taxpayer funding, your cost/performance priorities are the same as if you are spending your own money.
--- http://davidnehme.blogspot.com
The main area in which we saw benefit was switching from the Portland Group Fortran Compiler to the Intel Fortran Compiler, which cut the timestep (simulation time/real time) nearly in half.
Every cluster in the department is assembled from commodity x86 components. Groups here have been moving from proprietary Unix architectures to Linux/x86 systems and clusters. Our group started out on RS/6000s, then moved to SPARC, and is now moving to x86. In terms of price/performance there really is no comparison.
As for TCO, the lifetimes of clusters here are relatively short, one or two years at the most. Thus a high initial outlay cannot be set by lower cost of operation.
for a sample of the differences, read the posts above mine! :p
with the "GRAPE" computers. (More links). I expect there are examples going back to the dawn of the computer age.
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
Because all your general super-computing needs will be filled by the G5.
how many supercomputer we build. We'll still be wasting cycles processing the first few nanoseconds of nuclear explotions on them or trying to find more oil or money or WMDs. The last thing we care about is the environment we all have to live in.
I wonder how rich we'll be when it finally hits us that irrepairable damage has been caused to our environment? I hope we're really rich so we can afford to buy a new Earth. Cuz we might need one by then.
why else would they be emulating it?
You can't judge a book by the way it wears its hair.
Definitely a really huge super-computer would be neat to have but honestly are they putting the ones we already have to good use?
From what I've heard [anecdotally] computers like the earth simulator go vastly under utilized for the most part.
So given that most nations [including the US] have budget problems specially concerning education couldn't people think of better uses for money?
And before anyone throws a "it's the technology of it" argument my way, I'd like to add that if anything I'd rather have the money spent on researching how to make high performance low power processors [and memory/etc] instead. E.g. an Athlon XP 2Ghz that runs at 15W would be wicked more impressive than a 50,000 processor super computer that runs a highly efficient idle loop 99% of the time.
Tom
Someday, I'll have a real sig.
Um, I think you're thinking of the RC-64 and 128 projects, which took years. Don't quote me on this, but back when I was actually running a D.net client, they had talked about doing 56 bit contests, and they usually only lasted a few days, then everyone would go back to doing the 128 bit contest.
I got bored of it all and switched to the Intel Cancer project. More useful. Too bad it doesn't run on linux.
-- Having a Creationist Museum is like having an Atheist place of worship
You're right. Different applications require different approaches. There are lots of things that can be best done with distributed systems, yet still some that require specialized systems.
And I called it too!
reproduced here:
Definately (Score:4, Informative)
by Anonymous Coward on Monday August 04, @05:48PM (#6609924)
There are still MANY applications for supercomputers. A lot of people think that linux/beo-clusters are going to be replacing supercomputers of the Cray/NEC/IBM variant. Not true. There are still many research, scientific, and military applications that require machines developed not for "slow" distributed number crunching, but require ultra high speed processor and memory architechtures.
So definately, time for Cray to come back and retake the supercomputer industry crown.
. bah .
The Project i was talking about was This one. Sure, it cost a cool 1/4 of a Mil at the time, but this was back in 98 and 99. costs have dropped, prossessing power is increased, blah, blah, blah..
What are we going to do tonight Brain?
Frankly, I don't want the fastest computer chips on the desktop to be designed by a company in another country (even if Intel makes them outside of the US) and I would rather that the cutting edge, be cut here, in my native country.
Good lord, why? Is it just national/istic pride? I see that as something to be outgrown with respect to driving, receiving, and appreciating scientific discoveries and technological advancements. Honestly, if Japan were to come out with, say, the first mass-produced DNA computer, I wouldn't be the slightest bit bitter, or reluctant to take advantage of it. I regularly praise other countries for doing things the U.S. hasn't.
German physicists were primarily responsible for breakthroughs in their field in the 19th and early 20th centuries, and during that period there was quite a bit of resentment from American politicians and scientists whose feelings boiled down to nothing more than "We should have gotten there first." I won't argue that fierce competition has been beneficial to mankind at large (we've seen it in the computer industry, after all) but I don't think I'm wrong in wanting the motivation to be something a little less self-centered, political, immature. An idealistic vision? Hardly. It's not too much of an expectation for us to evolve beyond petty glare-throwing.
The coolest voice ever.
They just want more job security.
"We spent US$35,000,000 on this supercomputer that can't be used for any other purpose. If you shut down the program, that money will have been wasted."
In his book, After the Internet: Alien Intelligence, James Martin predicted the future will be dominated by highly specialized "machines" tailored to perform a limited set of tasks. His vision of AI is quite different from what we call it today.
If you'd like to see what these people are up to for yourself, here is a link to their website. Lots of performance data, lists of projects, etc.
Could someone with knowledge on supercomputers tell me the story here. thanks.
superconductor computer petaflop
There seems to be an impression in some comments that this machine has some sort of special design that's only applicable to climate modeling problems. In fact, this is a vector-based supercomputer, applicable to any problem where you need to perform vector operations (i.e., operating on large arrays of numbers in parallel).
Certain numerical operations can be performed blindingly fast on these types of machines. Each arithmetic processor on this machine has 72 vector registers, each of which can hold 256 elements. Then you can perform operations on all 256 elements of 1 or more registers simultaneously! If the algorithm can keep the vector units fed, they will scream.
Since keeping data flowing to the processors is critical to speed, the high-speed interconnects (~12GB/s) are a must for any problem that is not completely localized. It's all about matching the problem to the hardware. There may well be problems for which a commodity cluster just can't get the job done like this can. Remember that each node of a cluster consumes power, produces heat, and takes up space. The raw cost of hardware is not the only consideration.
Is there any speculation out there about what type of supercomputers the NSA has? Their budget is off the record (estimated $6Billion+/yr) and surely they have interest in cracking all of those 4096-bit encrypted messages sent between the US and Saudi Arabia et al.
Whenever the offence inspires less horror than the punishment, the rigour of penal law is obliged to give way...
I would love these chips to mass produced for desktops.
Who the hell modded this up?
Those supercomputers DRIVE the progress that gives you multi-Ghz personal computers! You might as well stop spending your money on those frivolous pee-cee things altogether.
Parent post is grossly overrated.
This guy came up with a way to evolve designs for FPGAs.
Article at New Scientist
Basically, he setup a number of FPGAs to acomplish a certian goal (testing a human generated sound like the work "GO") by setting up a "standard genetic algorithm to evolve a configuration program for an FPGA." His hardware basically evolved to acomplish the defined task. I'd like to read about someone doing this on a grander scale... Say 1024 FPGAs?
-EtA
Earth Simulator is impressive in its own reguard, but in no way is the majority of clustering apps going toward these 'specialized' systems. Governments, research labs, etc. want powerful computers that are dirt cheap. Los Alamos's ASCI Q (Installment 1, the Alpha servers) cost over $100,000,000 to build, while their Pink cluster cost about $6,000,000 in hardware. On paper, Pink and ASCI Q are both around 10 teraflops. ASCI Q runs Quadrics on 64-bit 66MHz PCI, Pink is getting an ugprade to Myrinet Lanai 10 on PCI-X (From Lanai 9 on 64/66PCI). Not only that, but Pink runs the open-source, 100% GPL'd Clustermatic software and can be booted in a matter of seconds rather than hours like ASCI Q.
The fact is, systems like ASCI Q and the Earth Simulator just aren't practical. They may look great on paper, but there's not much that they can do that can't be done on x86. Given the choice between paying over a hundred million for a proprietary cluster that might not even be all that reliable (*cough*Q*cough*) and requires expensive software and maintenance contracts, we see companies like Linux Networx offering high-power clusters on common hardware and free software that are a fraction as expensive.
As far as reliability goes, don't get suckered into thinking that proprietary and expensive mean quality. Q's failure rate is almost as high as my old Windows '98 machine hahaha. With the exception of a few missing chillers, Pink seems relatively healthy with only a few minor failures.
If CRAY and NEC want to get into a pissing contest in specs, that's fine. If they offer something that Intel can't, more power to them. Otherwise, the five organizations in the world that own their systems can be proud that they have the most powerful computer on paper for a year or two before someone builds a cheaper x86 cluster that matches or out-performs them.
Can't remember where I heard that though.
If these big supercomputers are so underutilized, why not run some public distributed projects on them in their spare time. (SETI, distributed.net, folding@home etc)
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
I've wondered, what would a mass produced box capable of running seti@home cost? something I could plug into my router, and the power, send it my user name, and let it burn through seti packets. hmmm
every day http://en.wikipedia.org/wiki/Special:Random
It's interesting that you say "Cray is back."
I'm from Minnesota, and graduated from the University of Minnesota (which Cray is associated with; I think Cray graduated from the U of Minnesota). I remember not too long ago people were lamenting the "death" of Cray and its loss from the state because people were abandoning specialized supercomputers for clusters.
It's really interesting to me that these specialized machines are making a comeback. In some sense, I feel deeply satisfied that the two different architectures are now both being recognized. In another sense, I'm saddened that specialized hardware was abandoned in the rush to cluster architecture. I love supercomputing, and am glad to see a renaissance of sorts, but there's something sad to me about the fact that certain things got neglected in the process. While clustering was growing, Cray was tossed around a lot. It's too bad there couldn't have been more of a balance all along.
That is the whole point.
I have the feeling the DOE (nuclear weapon simulation etc) simulation program is not going anywhere near as well as it was sold.
Massive commodity clusters boast big numbers but they do not boast great useful throughput of USEFUL RESULTS. (also with massive clusters
you have to be able to deal with inevitable hardware failures).
You have a certain fluid problem---there is a certain speed of sound, and a certain physical geometry. What you want to do is to be able to simulate the real thing at ever smaller grid-sizes, that is, with greater numerical approximations to the physical fields.
Ideally, if your problem were embarassingly parallel and clusterizable, then you could put any number of grid points on each CPU and crunch away. You want more grid points? buy more CPUs.
The problem is that in actual physics the length scale of 'interaction' per time step does NOT go down---remember, speed of sound is constant as is physical geometry---imagine for instance the uh, radiative driven implosion of a certain unspecified dense material in spherical or cylinderical geometry into one unspecified not-dense material.
So when you scale-up in the scientifically useful sense---and not the computer nerd sense---then a problem which used to be solvable efficiently on clusters NO LONGER IS SO. There is just too much communication, and this is driven by physical reality.
It is not 'OK' to just say "change your code". The codes are developed with mathematical methods and based on experimental data gleaned over literally decades at great expense.
Programming for these is not easy---but it is quite a bit easier for the large vector old-skool cray type machines than the clusters, where the human has to do almost all the scutwork (e.g. MPI).
The problem is actually more severe with the DOE fluids problem---there are fundamental mathematical issues in the nearly inviscid flow (singular perturbation theory baby) which have not yet been resolved. And they appear at smaller and smaller grid sizes.
This requires rapid development of models and validation at the physically important resolutions and you can't do this with a cluster.
I have no inside information whatsoever but I smell that the sudden DOE and DOD interest in back-to-the-future retrosupercomputing is because of some major failures in the recent cluster efforts.
There is also a direct trade-off between more general purpose systems and systems custom tailored to a task. Good examples are Deep Blue and Blue Gene. Both of these systems are designed with a particular task in mind (i.e. chess and protein folding) and therefor are able to leverage knowledge about the problem space to constrain the kind of hardware, the particular low-level instructions and the information flow within the system while achieving signifigantly greater performance on a small class of problems. I work with clusters that are used in scientific communities that have various researchers working on various problems. In these cases, the questions are about basic applicability of a particular problem to a particular architecture. For example a cluster with high-speed interconnects made of good COTS hardware will allow a user with a very granular problem to effectively use the cluster and it will also allow a user who needs the high speed interconnect because the problem space demands a high degree of internal communication. But the first researcher might also be able to make use of a grid of (for instance) many more computers with a total lower cost because (s)he doesn't need the high speed interconnect. The Earth Simulator gains a lot of performance (on a class of problems) because of the underlying vector processor architecture. Given the right internal bus it is conceivable that adding vector processor daughter boards to the next generation of COTS clusters could achieve similar results--but, of course, only for problem spaces that make efficient use of such processors and aren't bottlenecked by the communication requirements.
Real answers are always more complicated. For example: the equations needed for nuclear simulation will probably require dedicated hardware (as the need for protein folding has lead to Blue Gene) to achieve the results that the Pentagon needs. But for many super computing tasks, the flexibility of COTS clusters will still be compelling, especially for areas where the algorithms are not yet fully developed (e.g. brain simulation). An interesting keynote at OLS 2003 argued that (some of) the problems are not going to be the local computing power but the need to move large quantities of data between research labs across the world and combine computational systems using the 'grid.' (For a down home examples of problems that have been successfully tackled through course granular distribution just look at SETI@Home and Distributed.Net. So its not just the flops anymore...
You'll take a huge hit in price. Big FPGAs are expensive. Being programmable also costs a lot of density. Finally, FPGAs are slow.
Xylinx's current top of the line part costs about $8000, has 8 million "system gates" (if you happen to have a design that uses the chip in just the perfect proportions), and clocks up to 300 MHz. To compare with a particular ASIC designed for computation, a Pentium 4 costs about $400, has about 55 million transistors, and clocks over 3 GHz. You're looking at something over 1000 times the cost effectiveness. There's a reason why people haven't replaced all electronics with big slabs of FPGAness.
I wouldn't count on programmability to necessarily make that up. If you've got a big, important, problem to solve, you can afford dedicated hardware. Besides, if you've got a working FPGA design, you can fab it with little effort, and get an even more effective system.
I do know that the same stupid ego's are in place using more "official" buzz words. I sometimes mute the sound from discussions like this and play a round of lip sync. I replace all the crap they spew with 1337 speak and try to imagine everyone as pimply faced 16 year olds... its not hard, believe me. I have yet to see a demonstration of professionalism and maturity from the scientific community or the political side that spawns and is spawned by it.
I pick number 4.
Call this flamebait or troll, but we don't need no stinkin' supercomputers!
The primary uses of supercomputers that I've read about are to perform simulations of real-world phenomena. It might be possible to contruct circuitry that makes a computer more efficient at a series of specialized computing tasks. It's arguably more efficient to not use supercomputers.
(DANGER - intentional lack of sentitivity below)
Examples:
1. Genomic research - inject experimental drugs into real-live humans. If a higher percentage live or improve, great. If not, the world has too many people anyway. Mutants rule!
2. Nuclear simulations - find a large desert or remote tropical island and nuke it. For studies of effects of radiation on humans - see 1.
3. Weather prediction - use a desktop computer and a weatherman's intuition to give 80% accuracy for what's reported. If your weatherperson is entertaining or easy on the eyes, people won't care about the 20% of the time that they're dead wrong. (Eg: www.nakednews.com)
4. Chess - Watching two people play chess is dull. Watching a person play a computer is dull and pointless. Watching two computers play each other is merely a senseless benchmark test.
5. America's Cup Sailboats - A sailboat is a hole in the water into which you pour money. The faster the sailboat, the faster you pour money into it. Arrrr, matey!
6. SETI - If there's intelligent life out there, it will either take thousands of years for them to reach us (normal sublight travel) or they will arrive here faster than any of their radio signals ever will. Let future generations tackle the problem when we can use orbital or lunar radio telescopes after we solve our own problems on Earth.
7. Cryptography - Social engineering is the most effective breaker of computer codes. Never underestimate the power of a wooden stick to extract secret keys from unbreakable coiphers.
8. Energy resource discovery - If it weren't for the damned environmentalists, we wouldn't care about drilling holes wherever the oil companies want to.
9. Video games - Oh wait, this would actually be a great use for a supercomputer. We'll call it the Metaverse or "The Matrix". <drool>
I'm insensitive, you insensitive clod!
People see themselves as "winning", often when they trample on others. This is because of a mis-identification. People identify with THEIR OWN community, nationality, religion and other personal bias. Instead, if you identify with yourself being a human- or spiritual being, you will see that there are only other human beings around you. Not muslims, not christians, not japanese, not even lawyers.
America and UK is not really very secure. It doesn't help to have the best defence in the world, when you're acting like spoiled brats who decide for others what's "best for the world". America is supporting known terrorist groups worldwide, and have a history of occupying other nations for the worst reasons. Most international problems, America has generated for itself. It doesn't help to be so advanced and win, when you get the world to hate you.
It's all because of misidentification and ignorance. Especially in America, where people believe "USA is the world". Humility can be a hard lesson, one that is due for a long time "Over There".
http://www.debunkingskeptics.com/
With the Bombe and Colossus machines?
Government of the people, by corporate executives, for corporate profits.
While the Earth Simulator might be custom in a sense (you can't order Earth Simulator nodes, for example), they are NOT specialized, or at least not very much.
The Earth Simulator nodes are the prototypes of the SX-6, so yes, they are custom-built, like all prototypes are. A fully-loaded SX-6 is very nearly the same as an Earth Simulator node, and the same interconnect is also available for the SX-6. I think one of the difference, beside the paint job, is slightly more efficient memory timings on the SX-6, so actually, the commercially-available version is actually BETTER!
But I guess you wouldn't be able to call an SX-6 "off the shelf"...
Both Earth Simulator and SX-6 runs a (putrid, but that's off-topic!) Unix variant which runs directly on the main processor, it isn't one of these designs with an off-the-shelf processor for the OS and management, and a specialized coprocessor for accelerating computations. It can operate on a variety of problems with equivalent ability. As for their power, that's what you get when you put lots and lots of vector pipes in processor: the SX-5 could do around 4096 floating point operations per clock cycle (if they are all the same!) and had a 256 *bytes* wide memory bus (compared the the puny "one flop every few clock cycle" of a Pentium 4, and its tiny 64 bit memory bus). I don't have the info for SX-6, but it is similar...
"Vector computers" might have been called "SIMD computers" instead, in the recent history.
They are in fact using more and more off-the-shelf hardware, having switched from HiPPI for networking and disk access to gigabit ethernet and fibre channel, respectively.
Disclaimer/credentials: I used to work for NEC's North American supercomputer subsidiary, then for Cray, who is currently NEC's distributor in America.