Building an 1100Mhz "SuperStation"
Anonymous Coward writes "There is an interesting article on building a dual Celeron 550 (overclocked 366) computer by David Green; he goes a bit into the theory of SMP computers, what components he chose, and shows some benchmark results (under Linux) for the system. His computer could really crank through RC5 blocks..." Us hardware tinkerers love this sort of stuff; the rest of you can feel free to ignore it. (AboutLinux.com is where this cool scoop came from, BTW.)
Exactly. I'm running Linux on this here 486, IPMasqing for 3 other computers to my cable modem and serving POP3, Sendmail and Apache. Normal load average? 0.06 or so.
Certainly some applications are embarrasingly parallel (aka "data parallel"); that is, after a tiny bit of startup cost, your speedup is only limited by the number of processors and the size of the problem (amount of data).
Examples of data parallel problems: image rendering, key cracking, matrix-matrix and matrix-vector multiplication.
However, many applications are not embarrasingly parallel; that is, the processors must communicate (aka synchronize) at certain points, in order for the computation to proceed. Here, your speedup is limited by the
Examples: sorting, matrix factorization (e.g. LU decomposition).
In my experience, commodity Intel motherboards scale very poorly for this latter class of problems. Why? If the two threads always hit their L2 cache (i.e. don't have to fetch across the memory bus to main memory), then everything might be ok. (even then, write sharing can cause cache thrashing!). If the threads must fetch miss L2 cache often enough, then (on commodity motherboards), the threads will be serialized, because the memory is not interleaved, nor multi-ported.
On fancier (expensiver, hehe) SMPs, processors are connected to either interleaved, multi-ported memory, or over a crossbar (rather than a bus), or probably all three. For example, the HP Convex Exemplar ($$$) has all three.
On a counting (integer) sort, a 2-processor commodity SMP is limited to 1.4/2 speedup (roughly the fraction of memory references which hit cache). The convex gets speedups of 1.95 (limited only by the tiny startup costs, as in the embarrasingly parallel case).
RC5 is an encryption algorithm from RSA Labs. A RC5 'key' is a specific decryption code that might decrypt an encrypted message. RSA is sponsoring a contest to see if anyone can crack a message encrypted with RC5.
The reason this is even mentioned is because there is a group that is working on this contest using a 'brute force' attack. distributed.net has a client you can download that will allow you to participate in this contest, along with thousands of other people.
This client is designed to use all CPU time that would otherwise be 'wasted'. People tend to use it as a benchmark, even though it's not very representative of actual computing power, since it uses a small number of instructions repeatedly.
If you have more questions, feel free to email me at decibel@distributed.net
dB!
distributed.net Human Interface
I gave up the sig's years ago. Gave me lung cancer.
I have a dual celeron system, currently running at 504mhz. Under Linux, I couldnt be happier with the speed of things. Celerons, believe it or not, can be used very efficiently in a server, regardless of the fact they only have 128kb L1 cache.
Just curious - a Celeron 366 overclockable to 550 costs $110; an AMD K6-2/450 is $55 - and I think it has a larger L1 and L2 cache, so the performance might be similar. Are there any particular reasons for avoiding AMD for SMP applications?
Take care for the little green heat sink
Check your SDRAMS for memory errors. I had the
same problems for about eight weeks until I found
out that some SDRAM memory cells were unstable.
Memtest is quite good in finding broken memory
chips which other memory testers cannot find.
--
About the hardware buying cycle ... I have experienced the same thing. While a 550 x 2 SMP Celeron machine certainly looks cool on paper - why in the WORLD does anyone need that? Ever since I cut back on my game playing (I regulate myself to basically emulated consoles now.. I find they're the most fun) and in the past year I've migrated from Windows to Linux as my desktop of choice, the hardware rat race doesn't amuse me anymore.
... they'd much rather shell out $700 for the latest 900MHZ WunderProcessor CPU and a board that supports it and plop it in their system,rather than taking their current perfectly usable system and say.. implementing SCSI in it which would probably make a bigger performance boost.
I've been trying to explain as of late to people
they put entirely too much emphasis on the clock speed of the CPU. I explain how the real bandwith in a system is the hard drive and video card usually. But no one listens
I know this will sound cheesy - but using Linux has given me more respect for technology. Before I'd think "oh gosh, that 486 sucks. It can't do anything!". Now and days, I see a 386 40mhz with a cd-rom and think "what a perfectly usable little linux box that could be!".
Stop software manufacturer & CPU makers siphoning of your wallets - use Linux. The little OS that could.
Actually, aside from buying the NT Resource Kit you can also get the uptomp.exe utility as a download from the Microsoft website. Unfortunately the program does have some problems and many people are left with unbootable systems afterwards, so there's also has a Knowledge Base article about doing it by hand.
BTW, I get about 3.16 megakeys a second on my dual Celeron 366 at 550. RC5DES is certainly very scalable, much more so than SETI.
Better off paying a lot less. And who cares what they were meant for? :)
I tried 466s at 588 and a 366s at 550 for several months. The bus speed makes a huge difference.
The 466s encoded MPEG video and tested RC5 keys faster than anything but compiling was dog slow.
The 366s are slower at MPEG encoding and RC5 than the 466s but compiling is light speed faster. You need to get those 366s pretested from a company which has been testing them for a while. My untested pair of 366s was stable running RC5, Seti, and Prime95 for days on end but attempting to composite video at 550Mhz crashed them every time.
I got a tested pair of 366s and these are stable compositing video. While they run Prime95 at 574Mhz the video compositing crashes them every time above 560Mhz, You need a really small heat sink to fit in the BP6. My dual 550 uses Radio Shack blowers on the default heat sinks and stays at 104F.
Joe
Slashdot's new slogan: news for nerdy wannabees. Stuff that's simple.
Unfortunately, the Celeron 366 haven't been available for a few months now, save for the few companies that have been hoarding them (and selling tested 366@550 at a premium price).
The Celeron has a fixed multiplier, so the only way to overclock is to increase the multiplier. 400@600 is not unheard of, but it's also not too common. While it's possible to use a bus speed in between 66 and 100, it's not desired because you'll have to overclock (or underclock) your PCI & AGP bus. That's a Good Thing in theory, but a Bad Thing in reality, because there are a good number of add on cards and hard drives that won't take a higher bus.
If I were to build an overclocked Celeron system today, I'd buy a single pretested 366@550.
Ok, just to explain myself, L1 cache is on the core, and is 32k on Celerons, and p2's and p3's. On celerons L2 cache is 128k on the same wafer as the rest of the processor, so it runs at full speed. On Pentium II and Pentium III processors, it is external SRAM chips, which is why Intel went to Slot1 (among other reasons). The reason Celerons overclock so well is the integrated L2 cache, because it is the same quality as the wafer. Pentium II and Pentium III's external L2 makes overclocking difficult, which is why celerons are clocked far past their intended speeds, and other's can't.
I decided to use the "slotket" approach: a small adaptor board that allows a Socket-370 Celeron to be used in standard Slot-1 motherboards. My assembly & test notes are online here.
By using the slotket, I am able to "upgrade" to a non-overclocked PentiumIII (or maybe Coppermine) when those CPUs become cheap enough. Until then, the dual-300A processors overclocked to 450 really cook!
Until they find an Alien that is. Look who'll be laughing then...
-
Therefore, I'd have to say that there's no reason for a normal Linux box to have dual processors. Come to think of it, my P-120s run WindowMaker pretty quickly.. maybe one of those is all most of us need.
Carefree highway, let me slip away on you.
I have a pentium 200 that does more than this. Talk about cheap! IP masq/firewalling takes practically no CPU (486 anybody?) and at the quantities of mail he is likely to generate/receive, an SMP system is overkill. Probably would run fine on a 486. Sigh.
Rant mode on:
It's funny how two events got me out of the vicious cycle of buying hardware. One, I quit playing games. I know it sounds extreme, but there just seems to be better things to do w/ my time. The other thing was switching to linux. Things just don't seem bad enough to upgrade my hardware anymore.