Supercomputers To Move To Specialization?
lucasw writes "The Japan Earth Simulator outperformed a computer at Los Alamos (previously the world's fastest) by a factor of three while using fewer, more specialized processors and advanced interconnect technology. This spawned multiple government reports that many suspected would ask for more funding in the U.S. for custom supercomputer architectures and less emphasis on clustering commodity hardware. One report released yesterday suggests a balanced approach."
How does one go about bench marking a super computer specialized to do a certain task versus cheap computers in a cluster. Now we need to spend more money to develop specialized super computers even though the case scenerio presented in japan might not hold true to other applications? Seems a little too soon to start making recommendations
The two studies resulted, in part, from NEC Corp.'s May 2002 announcement of the Earth Simulator, a custom-built supercomputer that delivers 35.8 teraflops. That system packed five times the performance of the fastest U.S. supercomputer at that time...
"The Earth Simulator created a tremendous amount of interest in high-performance computing and was a sign the U.S. may have been slipping behind what others were doing," said Jack Dongarra...
Graham said researchers should not overreact to NEC Corp.'s Earth Simulator that blindsided many in the high-performance computing community eighteen months ago by delivering a custom-built system five to seven times more powerful than the more off-the-shelf clusters developed in the U.S.
I don't mean to draw a crude analogy here, but I really can't help but read this and be reminded of the space race.
It took Sputnik to kickstart our spacemindedness; I for one consider it sad that a "tremendous amount of interest" -- and the funding that comes with it -- in high-performance computing seems only to have arisen/regenerated with the influence of competitive international politics. Are we really so hardly advanced that our respective national egos are still the driving force behind enthusiasm, financial or otherwise, in certain areas of science?
The coolest voice ever.
Is there a way to really compare the speed of a supercomputer and commodity hardware? If anyone could give either a quick explanation or a link to the relationships between bogomips teraflops MHz and the whole lot I would be very much appreciative.
-Silmarildur
Specialized hardware (almost) always outperforms commodity stuff.
I use custom designed amplifiers because they work better for my application. I could buy off-the-shelf stuff (~$500~$10,000 range), but that won't be exactly what I want. I use custom software too... know why? Because it's designed specifically for the job. That same software shouldn't really be used for other fields of research, neither should my amplifiers. The thing about this stuff is that it takes a lot of time to maintain (plus initial development). That means grad students, postdocs, and technicians who may spend over 90% of their time just keeping systems in working order and/or adding features. The benefits of customized hardware/software, in this instance, is worth the headaches associated with it.
All of my optics is commodity stuff (some is rare/exotic, but it's still basically black-box purchasing). I don't have the facilities to make coated optics, nor do I need anything that specialized, so... I just buy it.
When I was in telecom, we used Oracle and Solaris and Apache. It worked, and the cost of developing the same functionality in-house was ridiculously high (plus we'd never get to designing our products that sit on top of it).
Eventually, it always comes down to a comparison between the cost (man hours, equipment, etc) of custom building and of integrating stuff from OEMs.
So, the question our labs need to answer is, does clustered COTS hardware get the job done? Supplementary to that, is it cost-effective to buy/design it in light of the previous answer?
In any field where you are pushing the limits of technology, you have to make such trade-offs. Personally, I don't care who has the absolute fastest supercomputer (measured in flops, factoring-time, whatever)... what really counts is, who does the best research with the supercomputers.
Down with Saudi Arabia!!!
Specialized systems are almost always going to outperform generalized systems when you're dealing with similar levels of technology (for instance, specialized abacasuses vs. a generalized Cray T3E).
;))
The great thing about generalized systems is you can use them to explore new areas, then design a specialized system to take advantage of specific optimizations the generalized one can't support.
I'm glad for the report suggesting a "balanced approach". I can't imagine forsaking one type of system for the other, as each has its place. (Uhoh... generalized systems have a "place"? Does that mean they're specialized at being generalized? Oh, the irony!
bytesmythe
Hypocrisy is the resin that holds the plywood of society together.
-- Scott Meyer
The problem is that it may not be possible to match the computation of a cluster with specialized interconnects using just commodity hardware no matter how many machines you throw at it. If a simulation has a low computation to communication ratio it's scalability is bound by the perfomance of the interconnects. In this case throwing more commodity machines at the problem will actually increase the total time required to run the experiment.
Stop being silly. The cooling requirements of an Athlon based massively powerful supercomputer would eat up the savings from using standard parts.
Seriously, though - I would guess, actually, that if one were to build a supercomputer from a "desktop" processor, the PPC970 (aka G5) chips would be a good choice. They have a solid vector unit, are RISCier, have a wider bus, and a better pipeline design. Plus IBM's fabrication capabilities are excellent - which helps in reliability and upgradability.
Frankly, I don't want the fastest computer chips on the desktop to be designed by a company in another country (even if Intel makes them outside of the US) and I would rather that the cutting edge, be cut here, in my native country.
Good lord, why? Is it just national/istic pride? I see that as something to be outgrown with respect to driving, receiving, and appreciating scientific discoveries and technological advancements. Honestly, if Japan were to come out with, say, the first mass-produced DNA computer, I wouldn't be the slightest bit bitter, or reluctant to take advantage of it. I regularly praise other countries for doing things the U.S. hasn't.
German physicists were primarily responsible for breakthroughs in their field in the 19th and early 20th centuries, and during that period there was quite a bit of resentment from American politicians and scientists whose feelings boiled down to nothing more than "We should have gotten there first." I won't argue that fierce competition has been beneficial to mankind at large (we've seen it in the computer industry, after all) but I don't think I'm wrong in wanting the motivation to be something a little less self-centered, political, immature. An idealistic vision? Hardly. It's not too much of an expectation for us to evolve beyond petty glare-throwing.
The coolest voice ever.
I think you're thinking of the EFF's DES cracking machine. It used a custom gate array chip - it took advantage of the cheapness of an ASIC, but not the extra efficiency (they couldn't afford to have the first round of chips not work properly - a large proportion of the chips didn't work properly anyway). IIRC, it searched the keyspace in 3.5 days.
There have been many other groups to attack DES on FPGA's, but none have achieved the same scale as the EFF machine. I will be attempting it myself very, very soon (as soon as I can get the key buffer in my design to work on actual hardware, we're all set - today, if I'm lucky!). Some (extremely) preliminary figures suggest that we might be able to match the EFF machine on larger Xilinx FPGAs for only a few tens of thousands of dollars (it cost almost $250k).
I'll be looking at the problem specifically mentioned in the blurb - comparing the price/performance ratio of FPGAs vs. software. At the moment FPGAs are looking they'll come out well ahead, but I have hope for bitslicing techniques to narrow the gap a bit. There are also ciphers that are designed to run well on software and are hence difficult to attack in hardware (DES was designed to run well on hardware).
I don't get what you are saying. Before my Athlon I owned a K6-2. Before that I owned a MII 300, before that a MI 166 and before that a 486SX.
Each time I bought a new computer it wasn't because I wanted to rival a local supercomputer. It was because newer technology existed that was faster than what I had. The newer processor allowed me todo more.
If AMD could make a 2400+ which generates half the heat I would use it. And such a decision would have nothing todo with the local super-computer capabilities.
That all being said a super-computer which uses off the shelf processors doesn't really "fuel" the science of electrical engineering. In fact of more importance in supercomputing would be reliability, uptime, maintaining sufficient inter-processor communication bandwidth, etc.
None of which I'll ever use in a desktop processor [perhaps for 50 years or so]. Even in this age of computing multi-processor desktop boxes are fairly rare.
So I think it's hard to say that the ability to cram more Xeons in a room really advances processor design. [or substitute another off the shelf processor].
Tom
Someday, I'll have a real sig.
That is the whole point.
I have the feeling the DOE (nuclear weapon simulation etc) simulation program is not going anywhere near as well as it was sold.
Massive commodity clusters boast big numbers but they do not boast great useful throughput of USEFUL RESULTS. (also with massive clusters
you have to be able to deal with inevitable hardware failures).
You have a certain fluid problem---there is a certain speed of sound, and a certain physical geometry. What you want to do is to be able to simulate the real thing at ever smaller grid-sizes, that is, with greater numerical approximations to the physical fields.
Ideally, if your problem were embarassingly parallel and clusterizable, then you could put any number of grid points on each CPU and crunch away. You want more grid points? buy more CPUs.
The problem is that in actual physics the length scale of 'interaction' per time step does NOT go down---remember, speed of sound is constant as is physical geometry---imagine for instance the uh, radiative driven implosion of a certain unspecified dense material in spherical or cylinderical geometry into one unspecified not-dense material.
So when you scale-up in the scientifically useful sense---and not the computer nerd sense---then a problem which used to be solvable efficiently on clusters NO LONGER IS SO. There is just too much communication, and this is driven by physical reality.
It is not 'OK' to just say "change your code". The codes are developed with mathematical methods and based on experimental data gleaned over literally decades at great expense.
Programming for these is not easy---but it is quite a bit easier for the large vector old-skool cray type machines than the clusters, where the human has to do almost all the scutwork (e.g. MPI).
The problem is actually more severe with the DOE fluids problem---there are fundamental mathematical issues in the nearly inviscid flow (singular perturbation theory baby) which have not yet been resolved. And they appear at smaller and smaller grid sizes.
This requires rapid development of models and validation at the physically important resolutions and you can't do this with a cluster.
I have no inside information whatsoever but I smell that the sudden DOE and DOD interest in back-to-the-future retrosupercomputing is because of some major failures in the recent cluster efforts.
There is also a direct trade-off between more general purpose systems and systems custom tailored to a task. Good examples are Deep Blue and Blue Gene. Both of these systems are designed with a particular task in mind (i.e. chess and protein folding) and therefor are able to leverage knowledge about the problem space to constrain the kind of hardware, the particular low-level instructions and the information flow within the system while achieving signifigantly greater performance on a small class of problems. I work with clusters that are used in scientific communities that have various researchers working on various problems. In these cases, the questions are about basic applicability of a particular problem to a particular architecture. For example a cluster with high-speed interconnects made of good COTS hardware will allow a user with a very granular problem to effectively use the cluster and it will also allow a user who needs the high speed interconnect because the problem space demands a high degree of internal communication. But the first researcher might also be able to make use of a grid of (for instance) many more computers with a total lower cost because (s)he doesn't need the high speed interconnect. The Earth Simulator gains a lot of performance (on a class of problems) because of the underlying vector processor architecture. Given the right internal bus it is conceivable that adding vector processor daughter boards to the next generation of COTS clusters could achieve similar results--but, of course, only for problem spaces that make efficient use of such processors and aren't bottlenecked by the communication requirements.
Real answers are always more complicated. For example: the equations needed for nuclear simulation will probably require dedicated hardware (as the need for protein folding has lead to Blue Gene) to achieve the results that the Pentagon needs. But for many super computing tasks, the flexibility of COTS clusters will still be compelling, especially for areas where the algorithms are not yet fully developed (e.g. brain simulation). An interesting keynote at OLS 2003 argued that (some of) the problems are not going to be the local computing power but the need to move large quantities of data between research labs across the world and combine computational systems using the 'grid.' (For a down home examples of problems that have been successfully tackled through course granular distribution just look at SETI@Home and Distributed.Net. So its not just the flops anymore...
Actually, the customized vector machines will usually achieve a MUCH higher %age of their theoretical peak computational capacity on certain "hard" problems then a cluster of comodity machines. The nearness of the nodes dictates that, if the average near neighbor latency is an order of magnitude faster then problems that are communications bound are going to be able to achieve much higher throughput on a tightly coupled cluster of faster, more specialized nodes then they would be able to on a larger more loosly coupled cluster of comodity systems. If your problem happens to be one which is trivially paralized and you are not hamstrung by limitations like the 4GB limit on 32bit CPU's then of course you should use the cluster of cheap systems, but if you have a problem which has no such mapping then the only way to effectivly achieve your goals might be a custom machine like the Cray SV series or the NEC SX series. Just because a particular machine has a bad track record doesn't mean that a whole class of systems should be condemned, on the contrary, many supercomputer centers have had good luck with their vector machines.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.