Linux Clustering Hardware?
Kanagawa asks: "The last few years have seen a slew of new Linux clustering and blade-server hardware solutions; they're being offered by the likes of HP, IBM, and smaller companies like Penguin Computing. We've been using the HP gear for awhile with mixed results and have decided to re-evaluate other solutions. We can't help but notice that the Google gear in our co-lo appears to be off-the-shelf motherboards screwed to aluminum shelves. So, it's making us curious. What have Slashdot's famed readers found to be reliable and cost effective for clustering? Do you prefer blade server forms, white-box rack mount units, or high-end multi-CPU servers? And, most importantly, what do you look for when making a choice?"
For the size and performance, they are hard to beat. A dual opteron setup in a 1U rack case is a very powerful setup in and of itself. The bonus of using off the shelf components with no need for proprietary hardware or software also make them very affordable. The added bonus is that you can simply get the parts from regular retailers for replacement.
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
Ammasso is a startup that makes iWarp-based RDMA hardware that runs over gigabit ethernet. Their technology is like Infiband, but much cheaper and almost as fast. Their drivers and libraries also provide MPI and DAPL support. The only support Linux (all 2.4 and 2.6 kernels) and they're way ahead of their competition in terms of performance, product availability, and support. Once you've decided on the servers, I strongly recommend you use Ammasso's hardware for the interconnects. Your hardware vendor may even bundle it with their systems - be sure to ask about that.
And the men who hold high places must be the ones who start
To mold a new reality... closer to the heart
I would love to see a picture of the google hardware...
Definitely worth checking out. It's one bad-ass Linux server -- and probably the only one to offer instruction execution integrity. That's a fancy way of saying 2+2 will always equal 4 on zSeries -- because everything is executed twice and compared at the hardware level -- or it won't execute.
If you need this, you need it bad.
Try getting parts for XServes after a couple of years. Want parts for a XServe G4 from yesteryear? Chances are you are gonna be fucked. We couldnt even source a drive caddy for one. Buy a proper server thats gonna be supported for longer than its trendyness.
To those who say Apple isn't targeting the enterprise, look no further.
Let me know when they stop trying to force their iPod updater (you know, the one that breaks Real's compatability DRM software) onto my servers. No matter how many times you put that update in the "Never update this" category, it shows back up the next time you run Software Update. Until they stop trying to play childish games on my production servers, I'll not consider them ready for the enterprise.
Tiger Server lets you run your own Software Update Server, which would solve this problem for you. You run a central update server, point all your servers and clients at that, and then you can approve or disapprove each update before it goes out.
- "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
Currently 65 (1 master, 64 nodes) of AMD Mobos on Ikea shelves. Cheap, easy to swap out, good air flow around the hardware. The shelves are wood, so everything just sits on them. It would be nice to find power supplies with extra connections to power more than one system.
Are you running iTunes on your production servers? Can't you just uninstall iTunes and be fine?
Why would a system configured as a fileserver have that software on it to begin with? Is Apple's apt tool so bad that it tries to patch software that hasn't been installed?
We wanted to set up a small 4-8 node cluster mostly for testing and as a compute resource. For various political reasons we were looking at an IBM solution. At my uirging we went for dual Opterons in the 1U format. And the price seemed right. Here's where it gets wierd *after* the OBM sales people step in. Going thourgh it peice by piece I thought I could put a decent system together - with our substantial IBM discount -- for $14k. By the time we got the quote with all of the crap they thought we needed it was 34k! Just to give the flavor, the rack and assorted pieces was 4k. But thats not the funny part. We were like, "well for this much money, we assume you are putting it together for us." "Um no...didn't you see the services quote that went along with this?" We hadn't -- with the services/support quote came in at $60k! So at this point we asked, can't we just buy the individual pieces we need and put it together ourselves. "Well, yes, but then it won't be an IBM e1350 cluster 'solution'..." "Yea, well, we don't really care what its called, it'll be just as fast and 75% cheaper..." At that time they were getting rid of their 325 servers for way cheap and we actually put that system together for as cheap as a whitebox and probably as cheap as if we'd tried to put it together ourselves. The moral I guess is that if you have to deal with the big vendors, have a very sharp pencil handy!
Obviously, there ain't no such thing as a free lunch. It all depends on what you want.
For sheer processor density, if you need complete servers, the IBM BladeCenter servers offer the most "Bang" (Fast), and they are fairly reliable and compact (Good). They are not cheap. They do have better density than the HP Blades. WETA Digital (Peter Jackson's FX company) uses them.
That will get you 2 server processors, two server-class IDE drives + 2 GigE ports + all peripherals (Power, KVM, CD, Mangement, GigE switches, SAN switches if you want, etc.) per one-half of a rack unit. This is well over twice the density of pizza box units when you count external peripherals like the networking switch, KVM, etc.
Google's setup is Fast and Cheap, but their hardware reliability is quite lousy. However, their clustering setup is specifically designed around expected hardware failure.
(As a side note, Google no longer uses bare boards for their basic nodes. They use fairly small and slow nodes with a LOT of RAM from some company I can't remember. They look kind of like over-sized hard drives.)
If you need crap-loads of raw computing power, in a relatively compact power-efficient chassis (1024 processors/rack), IBM's Blue Gene simply cannot be beat. This is Captial-F Fast, and Capital-G Good, but you certainly can't afford one. (While it provides more cycles for the watt and dollar than any other setup, it isn't exactly as simple as a Beowulf cluster.) And you would still need to buy pesky things like large GigE switches and storage. Check out the current issue of the IBM Journal of Research and Development on IBM's website (or your local university library) for all sorts of juicy details.
[Yes, I am an IBM shill]
So realistically, you really need to look at your application. If it can tolerate failure of any individual node on a regular basis, get the cheapest stuff you can find that will fit in your space and CPU requirements. If node reliability is important, but space is not, 1U servers from any of the three major vendors (or Apple, if that is your thing) will do the job just fine. If you need reliability and space, then honestly IBM's BladeCenter boxen are the best, as long as they fit your application. (I am not just speaking as an IBM'er here... they really are the best blades out there.)
SirWired
Not sure I follow what you mean by "not interconnected in any meaningful way". The current line of HP blades not only offers an onboard Cisco switch that is certainly going to save ports on your main data center switches, but an onboard Brocade SAN switch has been announced as well that is going to save ports on the production SAN switches. Not to mention the cable savings of each option.