Export Controls on Beowulf?
Gary Franczyk writes, "The United States government has tightly controlled the export of "supercomputers" to certain other nations (i.e., China, Pakistan, India, etc.) for quite some time. Sun has had to deal with this numerous times when selling their equipment. How will the U.S. government handle the fact that now anyone with access to large numbers of PCs can create a "super-computer" cluster? I'm sure that the government is using Beowulf to do nuclear simulations right now... Who says that other nations cannot do the same? " Interesting thought. I'm not aware of any export controls on Beowulf, but with the U.S.'s views on cryptography, how will it be before such draconian views extend to any powerful computing technology? Is it even possible for the U.S. to restrict Beowulf in any way?
Yep, that should keep the cat in the bag. It worked for cryptography, after all. Them furriners don't have kryptography, 'cause of our export controls.
Oh, well.....
See what I've been reading.
I'm with the Technical University Munich, and the Leibniz Supercomputing Center next door is getting a new Big Box in March, which will then be the most powerful computer in Europe. The peak transfer rate between its nosed is 10 GIGABytes per second, IIRC. At the moment, thay're still installing the cooling units (the thing will consume about 600 Kilowatts!).
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
I guess you haven't been hanging around Slashdot long enough. This came up and was resolved nearly two years ago.
--
Here is the result of your Slashdot Purity Test.
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
Yes, and the easter bunny visits my house and leaves golden eggs on my porch.
Try the following: ""The United States government has enacted legislation that attempts to tightly control the export of "supercomputers" to certain other nations (i.e., China, Pakistan, India, etc.)". Even then you mislead people, simply because of the word "export". The vast majority of the required parts are not made in the US. (Is there a single necessary part where all possible components that could be used are manufactured in the U.S.?)
To keep a product like that in the hands of the U.S. only would require the creating corporation agreeing to do the following.
- Keeping all manufacturing in the U.S. at U.S. wages.
- Refusing to patent the technology.
- Keeping a very expensive security lid on the entire facility.
- Not releasing any details that would allow anyone with the resources of China to come up with an equivilent technology
Yeah. Right.In a way, it seems silly to refuse to sell certain nations supercomputers when we still hire their citizens to work on our supercomputers...
-----
No Zen is good zen
However, most nuclear tests these days seem to be for shows of strength (France and the India/Pakistan tests spring to mind), so it is actually more dangerous, in my view, to develop and test nuclear technology using supercomputers, than to develop and test them in "the open", since open testing is a good deterent to other countries.
Perhaps there should be a clause in the GPL, that GPL'd software can't be used to bring about armageddon. OK, that won't work since: a) it violates the Open Source Definition, and b) Emacs would have to be removed from all sites;) - but at least require any nuclear technology developed under Linux be released GPL, maybe have nuke.soureforge.net. This would actually be cool, perhaps VA Linux could fund tests of the open-source nukes on some random place (off the top of my head - Redmond?), if an angry penguin running at you at 100mph is scary, what'll an angry penguin with a nuclear warhead be like?
Sorry about the incoherence of the above post, it's been a long day (and it's only half-way through as well)
--
You'll probably find the story in the Slashdot archives. People were mirroring the Red Hat CD and the Beowulf archives on every part of the globe, within an hour of the story breaking on this site. (I'm not joking! If there's any "wild exageration" it is more likely that of one of an hour being far longer than it actually took.)
About two, maybe three, weeks later, the Beowulf site was back up and running. Almost certainly monitored, though. This was definitely munitions, according to someone with the clout to push a NASA site around.
IIRC, though, Beowulf is really not much more than some finer tuning for the network drivers, PVM, MPI, and some freebie cluster management software. Most of the tuning was for the 2.0.x kernels and has since been incorporated into the main tree. PVM and MPI are freely downloadable, and there are later versions than on the Beowulf site. There are also lots of cluster management packages around, now, as well. Beowulf, IMHO, has ceased to be the specific patches/bundle released by AMES, and has become any collection of boxes, configured to act as a single, multi-node, supercomputer.
And, yes, export of supercomputers is VERY restricted. Apple can't export any G3-based computers (though whether anyone in the rest of the world is upset by this is anyone's guess), and it's unlikely that newer-generation processors from other companies will qualify for export, either.
(Personally, I suspect an overclocked, supercooled SMP K7 board would exceed the limits by quite a substantial margin.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Let's go through this real quick:
That's all there is to it, as far as I know. I should add that many "Freedonias", during the cold war, used the exact same procedures to illegally acquire hardware they were not allowed to buy... There are even tales of the (old) USSR acquiring Cray machines, when these were the "crown jewels" of US computing. Commodity hardware has just made this 100 times more simple...
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
Apple can't export any G3-based computers (though whether anyone in the rest of the world is upset by this is anyone's guess)
:)
Gee whiz, in that case the iMac that has sat in my desk since October 1998 is just a figment of my imagination, right? I've been dreaming about it all along, eh?
Apple is forbidden to directly export one specific model: the G4/500MHz, which exceeds 1 GFlops and is therefore subject to "supercomputer" regulation. But iMacs, iBooks, "big" G3s and more recently "big" G4s can be found all around the world, including here in Brazil. (Okay, so they're all priced like supercomputers... but that's not the issue
To the editors: your English is as bad as your Perl. Please go back to grade school.
While the Beowulf patches come out of NASA, there's a whole bunch of stuff out there which isn't written in the US at all. For example, the best session clustering technology for Linux is MOSIX which is put out by Hebrew University in Isreal. To my knowledge this isn't export restricted at all, and is released as a set of patches against the main kernel tree. Anyone with basic System Administration skills could set up a Mosix cluster pretty quickly.
e ms.html">Yahoo Batch Queuing Page" for a short list of many popular packages.
If you're less interested in interactive clustering and need computational load balancing instead, there's a whole slew of batch queuing packages available from GNU QUEUE to the many derivatives of NQS out there. Here's the a href="http://www.cmpharm.ucsf.edu/~srp/batch/syst
I don't think the US government could stop any nation from purchasing commodity hardware manufactured from around the world, installing a basic Linux or BSD distribution, and setting up a batch queue or other type of basic cluster. Never mind that a sufficiently serious government could just up and write their own... in my department at BBN (Speech and Natural Language Processing) we use an internally written batch manager which is surprisingly simple... all written in C.
Many problems do not parallelize well. For instance, to my admittedly limited knowledge no parallel version of the fast Fourier transform algorithm (which serves as the backbone of many spectral and pseudospectral codes) is known which does not require a prohibitive amount of interprocessor communications.
At the risk of being overly pedestrian, let me try tackling your differential equations question: Communications latency issues can crop up even if you have just a single equation to solve. Let's imagine, for the sake of discussion, that you wish to understand the propagation of heat on a metal plate, and you have a differential equation that describes the process. Conceptually, you might imagine solving this problem on a parallel computer by breaking the metal plate into a bunch of smaller regions, and asking each processor to compute heat flow on an individual region, as in the following, where a plate is broken into 9 regions:
OOO
OOO
OOO
You can see that every region borders other regions, and herein lies the difficulty: To compute how heat propagates in any one region, say the top left corner, you have to have information from each of the neighboring regions. (In the case of the top-left corner, it'd be the center-left and top-center domains). In solving differential equations, this information is called the "boundary conditions." Each time step would require sending a considerable amount of information among the processors in order to handle the boundary conditions. To use a real-world analogy, if communication latency is high, then many of the processors will end up waiting for the information they need, much like workers in a bureaucracy that has an inefficient internal mail service. In "real" supercomputers you pay big bucks for fast communications, and problems that are communications-intensive will naturally perform better on these machines than on Beowulf or Appleseed clusters. Alternatively, problems whose algorithms require few messages to be passed among processors (many Monte Carlo algorithms have this feature) may run very efficiently on a Beowulf or Appleseed cluster, where communications latency is high.
Parallel computing seems to be largely an exercise in economics. Any parallel algorithm with a nonzero number of messages to be passed will necessarily run at something less than 100% efficiency. Just how far below 100% depends on the nature of the algorithm and the machine/cluster it is running on.
Name 1 - 1 task - that requires a supercomputer that can't be broken down into nodes well
Sure, I can name several. Just some examples: Weather simulation. Ocean simulation. Molecular simulations. Simulation of astronomical bodies. All of which are very real problems.
In short, any problem which is not trivially parallel will get a much poorer speedup on a NOW (network of workstations) versus a real supercomputer. Many of the problems above will generate many MB/s of data per processor (60 - 200 MB/s is not uncommon).
What you fail to realize is that many problems run for many iterations, and for each iteration you need to distribute the global dataset to all worker nodes. Take the Barnes-Hut program, for example. In that program, each node get a set of close-by astral bodies (stars and planets), and calculates their new positions for the next time step. To do that you need the positions of all other stars. For the next time step, you need to a) collect the calculated positions from all worker nodes, and b) distribute them back for the next iteration. When trying to run that on a NOW, you will very soon find that doubling the size of the cluster will not give any speedup at all, since they will spend most of the time chatting with each other on the network. On a supercomputer, you can run many more worker nodes before this happens.
Wow, someone who knows what they're talking about....
:)
I deceive people well.
If I understand correctly, you are describing how a "surface-to-volume" ratio goes up as the volume elements get larger, thus allowing individual processors to spend more time crunching numbers and less time waiting for boundary data. This is indeed true, and this is precisely the kind of balancing act one has to perform to compute efficiently in parallel. As you've demonstrated, the same algorithms may be more efficient on some machines than on others, but based on my (albeit limited) experience in computational physics, optimization almost always seems to boil down to how one reduces the number of messages that have to be passed in order to perform the task. This seems to be the single most important factor in the scalability of numerical calculations (how much speedup is gained by increasing the number of processors).
Disclaimer: While I have some experience in parallel computing, I am by no means an expert in this field, and I suggest you read some of the other excellent posts in this thread to hear from the real experts.