The theoretical peek performance for one 733MHz PIII is 733MFlops, as it can do one FP op per clock. That's a TPP for the whole cluster of about 375GFlops. Now let's say it scales as well across 256 nodes as it does on our 64 node cluster that uses myrinet (85.5%). That gives 320GFlops.
Unfortunatly one chip isn't actually going to produce 733MFlops on Linpack. A PIII-500 gets about 200, which is 40% of the TPP. Dunno much about the 733Mhz chips (except the cache runs at full processor speed, but if it's only 512kb it's not going to offer much improvement), but I'll be nice and say it gets 75% of the TPP. Ok, I'm probably being REALLY nice there. That leaves us with ~240GFlops.
Of course, for a press release 375GFlops looks alot better:-)
Sure. There are basically two major kinds of supercomputer: shared memory (e.g. SGI's Origin2000) and distributed memory (e.g. IBM SP). Beowulf style clusters fall into the distributed memory category. More expensive interconnect (e.g. myrinet) starts to approach the speed & latency of that in commerical offerings at a lower price.
Tasks such as 3D rendering are not very communications intensive, so a beowulf-style machine with processors that compare to those in commercially available machines will run at about the same speed. Communication intensive tasks, such as meteorological simulations, don't run as well unless you shell out the big bucks for better interconnect.
The linpack benchmark, which solves a system of linear equations, is used to determine ranking on the top500 list. See the website for more info.
One dual processor SMP box isn't necessarily better than two UP boxes. Contention for memory and the network card are going to be big issues. Using myrinet with MPI on an SMP box isn't as good as it could be at the moment, either. MPICH-GM (MPI w/ support for myrinet) doesn't support communication between processes on the same machine using shared memory-they have to go to the myrinet switch and back. This should be resolved soo, though.
You're making a category mistake. MPI and PVM are message-passing libraries. LINDA is a programming language that uses a tuple space stored in distributed shared memory (see here for more info. HACMP is a completly different beast, see IBM's homepage.
Beowulf != any of these. Beowulf is the idea that one can take commodity, off the shelf (COTS) components and build a powerful machine at a price far less then a comparable commercial offering.
Codes run on Beowulf, and really any parallel machine, typically use MPI, PVM, or custom message passing libraries. The beowulf idea includes the use of MPI & PVM, among other freely available software packages. Codes that run on shared memory machines typicall uses the shared memory device of MPI, shared memory, or pthreads.
For CPU intensive tasks the Beowulf idea is great. Codes that perform lots of disk I/O suffer, as adding higher performance (i.e. SCSI) disks increases system cost greatly. Communication intensive tasks perform the worst on beowulf style clusters compared to commercial computers, as the interconnect on beowulf-style clusters can't compare. For a relatively large increase in cost, one can use Myrinet. With Myrinet bandwidth and latency begin to approach that of the switch found on the IBM SP series of machines.
With high bandwidth, low latency interconnect technologies that scale well (e.g. Myrinet), one can build a cluster that outperforms a comparable commercial offering at, say one quarter to one eigth the price. The difference at that point is software. There's really not alot out there to configure and administer beowulf-style clusters, and commercial implementations of some packages beat the pants off of their freely available counterparts (compilers, for example). Until the software situation changes there is still reason to buy your big iron from IBM, SGI, and Sun.
We maintain a 64 node Sun Ultra5 cluster at SUNY Buffalo. It uses Myrinet for interconnect, utilizing eight 16 port switches, with four of those each connected to the other four with a single link. It scales REALLY well-for Linpack, we get abuot 85.5% of the single processor speed no matter how many processors we use. I can't imagine that going down very much if we add more boxes. There are Myrinet installations out there that have 1000+ nodes (many search engines use Myrinet), so it's got to scale well for 64+ nodes.
Of course that's only an educated guess on my part. If I'm incorrect, note that Myrinet is coming out with single unit 64 and 128 way switches later this year. That should help improve the interconnect situation for larger clusters a great deal. Prices will be dropping too, possibly putting Myrinet in reach of groups with smaller budgets.
As a current developer of SnB, I'm just curious as to who you (torinth) are. Andrew?
Unfortunatly one chip isn't actually going to produce 733MFlops on Linpack. A PIII-500 gets about 200, which is 40% of the TPP. Dunno much about the 733Mhz chips (except the cache runs at full processor speed, but if it's only 512kb it's not going to offer much improvement), but I'll be nice and say it gets 75% of the TPP. Ok, I'm probably being REALLY nice there. That leaves us with ~240GFlops.
Of course, for a press release 375GFlops looks alot better
Jason
Tasks such as 3D rendering are not very communications intensive, so a beowulf-style machine with processors that compare to those in commercially available machines will run at about the same speed. Communication intensive tasks, such as meteorological simulations, don't run as well unless you shell out the big bucks for better interconnect.
The linpack benchmark, which solves a system of linear equations, is used to determine ranking on the top500 list. See the website for more info.
--Jason
--Jason
Beowulf != any of these. Beowulf is the idea that one can take commodity, off the shelf (COTS) components and build a powerful machine at a price far less then a comparable commercial offering.
Codes run on Beowulf, and really any parallel machine, typically use MPI, PVM, or custom message passing libraries. The beowulf idea includes the use of MPI & PVM, among other freely available software packages. Codes that run on shared memory machines typicall uses the shared memory device of MPI, shared memory, or pthreads.
For CPU intensive tasks the Beowulf idea is great. Codes that perform lots of disk I/O suffer, as adding higher performance (i.e. SCSI) disks increases system cost greatly. Communication intensive tasks perform the worst on beowulf style clusters compared to commercial computers, as the interconnect on beowulf-style clusters can't compare. For a relatively large increase in cost, one can use Myrinet. With Myrinet bandwidth and latency begin to approach that of the switch found on the IBM SP series of machines.
With high bandwidth, low latency interconnect technologies that scale well (e.g. Myrinet), one can build a cluster that outperforms a comparable commercial offering at, say one quarter to one eigth the price. The difference at that point is software. There's really not alot out there to configure and administer beowulf-style clusters, and commercial implementations of some packages beat the pants off of their freely available counterparts (compilers, for example). Until the software situation changes there is still reason to buy your big iron from IBM, SGI, and Sun.
--Jason
Of course that's only an educated guess on my part. If I'm incorrect, note that Myrinet is coming out with single unit 64 and 128 way switches later this year. That should help improve the interconnect situation for larger clusters a great deal. Prices will be dropping too, possibly putting Myrinet in reach of groups with smaller budgets.
--Jason