KLAT2 is nearly 2 years old and it gets 65GFLOPS
on a real application... peak speed is 180GFLOPS.
The G4 doesn't do badly speed-wise, but is not
price/performance competitive with Athlons... in
fact, right no general-purpose processor is.;-)
You specify some characteristics of your application, your site (power and space),
and budget; it presents the best designs
taken from a design space of millions.
Two useful definitions that explain why KLAT2 was built as it was:
A supercomputer is a computer that is
not only very fast, but whose design allows it to
be scaled-up to be faster as more money is spent.
Bisection Bandwidth, the worst-case
total bandwidth between halves of a parallel machine when all processors are communicating, is the primary measure of supercomputer network bandwidth, NOT NIC speed. Further, NIC performance is often limited by the OS interface and/or PCI bus. This is why a network made of multiple 100Mb/s NICs per PC and cheap wire-speed switches easily can equal or exceed the performance of using Gb/s NICs and the narrower, often less than wire speed, Gb/s switches.
The same argument applies for latency: single switch for 100Mb/s FNN versus multiple switch hops for Gb/s.
We have been building AMD PC clusters for several years now, ever since the K6-2. The Athlons are especially impressive. Our latest cluster, KLAT2 (Kentucky Linux Athlon Testbed 2), should have its 66 Athlons chugging away by April. We demo'd our first Athlon cluster at SC99 in November 1999.
Although we have used SMPs as well (e.g., PIII quads from Dell), modern processors are memory bandwidth starved, and simple SMPs magnify the problem. I think a lot of cluster designs try to use SMP nodes to compensate for overspending on the inter-node network. I prefer to do the network carefully and use uniprocessor nodes.
PS: I'm the author of the Parallel processing HOWTO and my first Linux PC cluster predates Beowulf (it was in Feb. 1994)... being good and even being first doesn't necessarily give you the highest visibility. Remember that when you think of AMD's Athlon.;-)
PPS: I used to be faculty at Purdue, but have recently moved to the University of Kentucky. Our new web site is http://aggregate.org/
KLAT2 is nearly 2 years old and it gets 65GFLOPS on a real application... peak speed is 180GFLOPS. The G4 doesn't do badly speed-wise, but is not price/performance competitive with Athlons... in fact, right no general-purpose processor is. ;-)
http://aggregate.org/CDR/ , the Cluster Design Rules tool
You specify some characteristics of your application, your site (power and space), and budget; it presents the best designs taken from a design space of millions.
Two useful definitions that explain why KLAT2 was built as it was:
The same argument applies for latency: single switch for 100Mb/s FNN versus multiple switch hops for Gb/s.
We have been building AMD PC clusters for several years now, ever since the K6-2. The Athlons are especially impressive. Our latest cluster, KLAT2 (Kentucky Linux Athlon Testbed 2), should have its 66 Athlons chugging away by April. We demo'd our
;-)
first Athlon cluster at SC99 in November 1999.
Although we have used SMPs as well (e.g., PIII
quads from Dell), modern processors are memory
bandwidth starved, and simple SMPs magnify the
problem. I think a lot of cluster designs try to use SMP nodes to compensate for overspending on the inter-node network. I prefer to do the network carefully and use uniprocessor nodes.
PS: I'm the author of the Parallel processing HOWTO and my first Linux PC cluster predates
Beowulf (it was in Feb. 1994)... being good
and even being first doesn't necessarily give
you the highest visibility. Remember that when
you think of AMD's Athlon.
PPS: I used to be faculty at Purdue, but have
recently moved to the University of Kentucky.
Our new web site is http://aggregate.org/