RPiCluster: Another Raspberry Pi Cluster, With Neat Tricks
New submitter TheJish writes "The RPiCluster is a 33-node Beowulf cluster built using Raspberry Pis (RPis). The RPiCluster is a little side project I worked on over the last couple months as part of my dissertation work at Boise State University. I had need of a cluster to run a distributed simulator I've been developing. The RPiCluster is the result. I've written an informal document on why I built the RPiCluster, how it was built, and how it performs as compared to other platforms. I also put together a YouTube video of it running an MPI parallel program I created to demo the RGB LEDs installed on each node as part of the build. While there have certainly been larger RPi clusters put together recently, I figured the Slashdot community might be interested in this build as I believe it is a novel approach to the rack mounting and power management of RPis."
Dude, you should totally mine bitcoins with that bad boy!
A new Raspberry Pi cluster Fram Boise University, eh?
http://blogs.linbit.com/p/406/raspberry-tau-cluster/
It looks like Orac.
Running the numbers from the paper says the $1000 x86 compute node took 3.85 seconds on a benchmark, where the RPI cluster took (456/32)=14.25 seconds and also cost about $1000. Thus, after porting the software, a 3.7 times slow down was achieved over traditional methods.
While there may be some gains (GPIO and such may be useful in this context) they didn't appear to be used here.
This looks like a fun project, that got research money, but was not very useful for the goal the money was supposed to be spent on. I haven't looked into the details, and I expect the parts may get reused for other projects later, but still, it seems kinda silly. The RPI was not build for that, its inefficient to use it that way.
Since when does /. allow scam advertising within comments?
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Not to diminish your achievements which are otherwise quite cool, but this novel approach to rack mounting is anything but. Quite possibly the single most important feature of a rack is ease of component access. By tying all components together with PCB standoffs you basically can't remove a single RPi if there's ever a pressing need.
If anything you've shown a novel way of cramming things together without the use of a rack.
Neat project but really the report left me frustrated.
You start by comparing price and features of RPi to two other alternatives, e.g. Onyx node.
Then you compare one RPi to one Onyx node. But moving on you never do a price or performance comparison of the 32 RPi cluster against the same onyx node which would be the interesting thing.
Figure 5 shows something you could possible relate to the earlier information but only graphically. You don't state the actual numbers!
Moving on "As discussed earlier, each RPi uses about 2W of power... except that you have overclocked it, so the power consumption is actually "higher" according to you. Well what is it *exactly*, in YOUR configuration? The one absolute value you give, 2 W, does not apply.
"Figures 9 a and b show the overall power use as measured at the wall..." No they do not. They show a pie chart of the percentage distribution of power by different components.
Make sure you include both absolute values and graphical comparisons, otherwise it does not help the reader get the full value out.
"14.6 seconds for a single thread on an Onyx node. With four threads it goes down to 3.85 seconds thanks to four processing cores. With eight threads, it goes up to a whopping 3.90 seconds."
This is interesting and good info. You have measurably proved that there is no gain from more than 4 threads on the Onyx. But what is "whopping" about it? Here you may try to use sarcasm to be funny. But it doesn't fit... First of all the values are not extreme enough to make the sarcastic joke work. And if you had a single, very small value, it might be funny to call it "whopping". But here in a comparison? I mean, why is 3.85 not whopping also then? It adds no value.
In summary, nice project but you can improve the report. Good luck.
So the summary of the informal document is that it's cheaper to build a 32-node Rasp.-Pi cluster than to purchase even a single node of the 32-node Beowulf cluster that may or may not be available to you. And if you want to get your Ph.D. work done, I must agree that it sounds better to not be dependent upon the whims and follies of others' benevolence in having external hardware clusters available for your use. Bravo, Joshua Kiepert, I like your "informal writeup". Best wishes on your work!
it looks like the purpose behind this project is to have an "always available" (to this Ph.D. student) 32-node cluster that is dedicated to doing the work which this dissertation student needs to perform in order to complete his Ph.D., and it makes sense to be able to do this for the cost of a single Xeon node in a larger beowulf cluster.
.
This lets him escape the externalities which might impinge on his getting his own work done, like the big bad Beowulf cluster not being up or available when he needs it, or it being prioritized for someone else's project (say a professor who has tenure and more funding available). Those sorts of shenanigans would delay his work. So a 1/3rd speed cluster that's always available for your own project is a helluva good deal at 1/32 the cost of the big bad beowuilf cluster, eh? At least I think so!
but the 32 raspberry pi's. are 3 times more expensive per compute speed unit than the onyx node he benchmarked against.
that's to say the 1000 dollars(8 threads) machine is about 3 times faster than all the raspberry pi's combined! it's a vastly superior computing solution.
it has to be for proofing some supercomputing sw and learning more than for anything practical.
you can't even get the pi's for price that would get you 32 pi's for a thousand bucks though. and add costs for cabling, power sources etc.
world was created 5 seconds before this post as it is.
Right, but a "vastly superior computing solution" for CFD or linear equations is one thing. Trying to simulate network communications activity for 32 or 33 nodes on a single compute node is probably slower than actually trying out the algorithms on dedicated hardware that instantiates an actual hardware network. Thus, for a project that tries out different networking and communications algorithms, a 3 times more expensive by your calculations might actually end up being 10 times less expensive, especially considering the locking and interprocess communications required in a multi-threaded simulation on a single compute node vs. actually running it on real hardware with 32 nodes and an ethernet network linking the 32 nodes.
.
Especially considering that this system is going to be used for wireless communications protocols, the real hardware solution is IMHO the better way to go.
Raspberry.Pi
Architectural
Messaging
since he says in his pdf document that " My research is currently focused on developing a novel da ta sharing system for wireless sensor networks to facilitate in-network collaborative processing of sensor data. In the process of developing this system it became clear that perhaps the most expedient way to test many of the ideas was to create a distributed simulation rather than developing directly on the final target embedded hardware."
Feeling threatened, eh?
32 pis, 800ma per pi, 25.6A. Call it 30A to give some margin for error. Not exactly exotic - should be doable for thirty quid or so.
I've read about servers that pack hundreds or thousands of arm or atom chips into one enclosure, giving great performance-per-watt for heavily threaded workloads. Mostly targetted at webservers.
i wish i had done this, therefore you suck.
Impressive and cool!
Can you please point me out to the performance results. I know the microserver is a cool toy, but running on SDs, I'd assume it was better to actually go for running his actual network, instead of running a simulator on it. I don't know what is he running particularly, but I recall during mu dissertation asking the department and taking over computer labs that many kids wouldn't use anyways. It was exciting because we knew someone could show up, and randomly reboot one of the systems, so they could use them, while professors used their own acquired hardware.
nevermind
I believe you still miss the point. The performance of the cluster isn't the real issue. The benchmark was run just to show the expected degree of parallelism was actually reached. The benchmark is in no way representative of the user requirements for the cluster itself and the tasks it is needed for. It was just ran as a checkpoint to demonstrate the cluster is working as expected.
Achille Talon
Hop!
is there a top500 for diy clusters?
But your supervisor is a fool.
One article that can't trigger the joke "But it will run a Beowulf cluster ?"
It is all about the RGB LEDs. Nothing else matters.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
So how about making a 32 node simulator?
Excuse me, but please get off my Pennisetum Clandestinum, eh!
If only I had technology-majoring friends at BSU, I would have known there was a decent geek community and chosen them over U-Idaho :|
The big problem in Ph D studies is your own review a few weeks before submittal time when you realize the things you should have done, at this point your own 'cluster' and always available is a "beyond price" jewel asset' to you. Awaiting priority on faculty assets could cost you your degree.
Good luck to you. Good thinking out of your priorities.
Regards Eion MacDonald
I would have preferred graphs with lines, logarithmic scale and comparison with the theoretically attainable performances.
Moreover, some more popular benchmarks should be run: HPL, NERSC Trinity benchmarks, or even real applications like Quantum Espresso which has some standard benchmark tests.
Power consumption should be measured when running any benchmarks as it may vary depending on the type of application (CPU bound, memory bound).
Nice project on the electrical and electronic engineering part, could benefit from the insight of someone from the scientific computing field.