Virtualizing a Supercomputer
bridges writes "The V3VEE project has announced the release of version 1.2 of the Palacios virtual machine monitor following the successful testing of Palacios on 4096 nodes of the Sandia Red Storm supercomputer, the 17th-fastest in the world. The added overhead of virtualization is often a show-stopper, but the researchers observed less than 5% overhead for two real, communication-intensive applications running in a virtual machine on Red Storm. Palacios 1.2 supports virtualization of both desktop x86 hardware and Cray XT supercomputers using either AMD SVM or Intel VT hardware virtualization extensions, and is an active open source OS research platform supporting projects at multiple institutions. Palacios is being jointly developed by researchers at Northwestern University, the University of New Mexico, and Sandia National Labs." The ACM's writeup has more details of the work at Sandia.
Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.
Oh. Wait...
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
This is virtualization... Imagine someone Imagining a beowulf cluster of those!
-Matt
--- Need web hosting?
Well, not sure how good they are now, but back when I studied at Uni we examined a few super-computer clusters and the rule of thumb in most cases was 1 CPU core per node was stuck doing IO for that node anyway, this was all before the move to Hypertransport with AMD though, so it may be much different for them now.
The fact was, it was a number that was constant, it wouldn't get worse with more nodes, it was always x nodes lost per y nodes, as this is. Just add more nodes :)
A worse problem would be if it was x^2 nodes per y nodes, then you're just throwing away money adding more.
...
It is really pleasant to see more and more OSS projects which are being deployed at national level and large infrastructures.
Hopefully some less greedy company who benefit from such projects will start paying the volunteer developers. But then again, I have found that a lot of times if you are doing something as a hobby/interest/challenge, rather than because you were employed to do it, the outcome will be more refined and efficient. Though I have yet to experience the latter part first hand.
Most of them would be running an application done in C/C++ or some other low level language with threading. The whole advantage of super computers isn't that they have an absurd ghz rating, but an insane amount of cores. This could be useful for testing how a network of desktop computers would work, which it sounds like from the summary they are doing.
TL:DR; Normal desktop software doesn't run faster on a super computer than on your 4 year old laptop.
> What is the point of virtualizing a supercomputer?
They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Virtualization offers a number of potential advantages. A paper we have had accepted to IPDPS 2010 that enumerates more of them, but a few advantages quickly:
1. The combination of a lightweight kernel and a virtualzation layer allows applications to choose which OS they run on and how much they pay in terms of performance for the OS services they needs. Because Palacios is hosted inside an existing lightweight kernel that presents minimal overhead to applications that run directly on it, applications that don't need the services (and overheads) of full-featured OS like Linux can run directly on the LWK/VMM with minimal overhead. On the other hand, apps or app frameworks that need higher-level OS services (e.g. shared libraries) can run the OS they need as a virtualized guest on top of the LWK/VMM. Because doing an actual kernel reboot on a machine like Red Storm is very time-consuming, (compared to a guest OS boot), this is a substantial advantage.
2. Mean-time-to-interrupt on some of the most recent large-scale systems is much less than a single day, and virtualization is potentially useful technique for addressing fault tolerance and resilience issues in HPC systems, assuming that its overhead at scale can be kept small.
3. A small open-source LWK/VMM combination enables a wide range of OS and hardware research on HPC systems both by being a small, understandable, low-overhead platform, and by providing a way to support existing HPC OSes and applications while enabling OS and hardware innovation.
4. A number of others I won't mention right now as they're being actively researched here at UNM, and by my colleagues at Northwestern and Sandia. ;)