Building an 1100Mhz "SuperStation"
Anonymous Coward writes "There is an interesting article on building a dual Celeron 550 (overclocked 366) computer by David Green; he goes a bit into the theory of SMP computers, what components he chose, and shows some benchmark results (under Linux) for the system. His computer could really crank through RC5 blocks..." Us hardware tinkerers love this sort of stuff; the rest of you can feel free to ignore it. (AboutLinux.com is where this cool scoop came from, BTW.)
Certainly some applications are embarrasingly parallel (aka "data parallel"); that is, after a tiny bit of startup cost, your speedup is only limited by the number of processors and the size of the problem (amount of data).
Examples of data parallel problems: image rendering, key cracking, matrix-matrix and matrix-vector multiplication.
However, many applications are not embarrasingly parallel; that is, the processors must communicate (aka synchronize) at certain points, in order for the computation to proceed. Here, your speedup is limited by the
Examples: sorting, matrix factorization (e.g. LU decomposition).
In my experience, commodity Intel motherboards scale very poorly for this latter class of problems. Why? If the two threads always hit their L2 cache (i.e. don't have to fetch across the memory bus to main memory), then everything might be ok. (even then, write sharing can cause cache thrashing!). If the threads must fetch miss L2 cache often enough, then (on commodity motherboards), the threads will be serialized, because the memory is not interleaved, nor multi-ported.
On fancier (expensiver, hehe) SMPs, processors are connected to either interleaved, multi-ported memory, or over a crossbar (rather than a bus), or probably all three. For example, the HP Convex Exemplar ($$$) has all three.
On a counting (integer) sort, a 2-processor commodity SMP is limited to 1.4/2 speedup (roughly the fraction of memory references which hit cache). The convex gets speedups of 1.95 (limited only by the tiny startup costs, as in the embarrasingly parallel case).
I have a dual celeron system, currently running at 504mhz. Under Linux, I couldnt be happier with the speed of things. Celerons, believe it or not, can be used very efficiently in a server, regardless of the fact they only have 128kb L1 cache.
Joe
Slashdot's new slogan: news for nerdy wannabees. Stuff that's simple.