Virtualizing a Supercomputer

← Back to Stories (view on slashdot.org)

Posted by kdawson on Monday February 8, 2010 @12:49PM from the slicing-up-the-pie dept.

bridges writes "The V3VEE project has announced the release of version 1.2 of the Palacios virtual machine monitor following the successful testing of Palacios on 4096 nodes of the Sandia Red Storm supercomputer, the 17th-fastest in the world. The added overhead of virtualization is often a show-stopper, but the researchers observed less than 5% overhead for two real, communication-intensive applications running in a virtual machine on Red Storm. Palacios 1.2 supports virtualization of both desktop x86 hardware and Cray XT supercomputers using either AMD SVM or Intel VT hardware virtualization extensions, and is an active open source OS research platform supporting projects at multiple institutions. Palacios is being jointly developed by researchers at Northwestern University, the University of New Mexico, and Sandia National Labs." The ACM's writeup has more details of the work at Sandia.

57 comments

Min score:

Reason:

Sort:

Oblig. by Anonymous Coward · 2010-02-08 13:05 · Score: 0, Funny

Imagine a beowulf cluster of beowulf clusters of those! Pwoar.
Oh, that's just super! by Anonymous Coward · 2010-02-08 13:17 · Score: 0

So if you're virtualizing a supercomputer on a supercomputer, would it not be better to call the host a "super-duper" computer?
Cool. by John+Hasler · 2010-02-08 13:18 · Score: 4, Funny

Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.
Oh. Wait...

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
1. Re:Cool. by Mitchell314 · 2010-02-08 13:45 · Score: 1
  
  Why virtualize a supercomputer when you can virtualize two for the same price of $19.95?
  
  --
  I read TFA and all I got was this lousy cookie
2. Re:Cool. by TubeSteak · 2010-02-08 14:01 · Score: 3, Interesting
  
  Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.
  I think you've got it backwards.
  Now we're virtualizing cheap desktops on supercomputers.
  What they're doing only makes sense if 5% of 4096 nodes* is cheaper than coding your app to run natively on the supercomputer.
  Like really big hard drives, when you get up to supercomputer levels of performance, 5% is a lot to give away.
  *Anyone know exactly what a node entails?
  
  --
  [Fuck Beta]
  o0t!
3. Re:Cool. by Tynin · 2010-02-08 14:47 · Score: 3, Informative
  
  *Anyone know exactly what a node entails?
  A node is generally just a fancy name for a computer in a cluster. Nodes don't always need a OS locally (getting it via PXE), and may have some special hardware. But honestly in my experience, a node is a node if the systems architect wants to call it one.
4. Re:Cool. by __aaclcg7560 · 2010-02-08 16:10 · Score: 1
  
  A supercomputer running 4096 copies of Windows will probably take a significant performance hit of more than 5%.
5. Re:Cool. by LoRdTAW · 2010-02-08 17:48 · Score: 1
  
  *Anyone know exactly what a node entails?
  At the very least: CPU + RAM. Also of course some glue logic (chip set), firmware (BIOS) and an interface to the rest of the cluster (networking).
Other way by Wrexs0ul · 2010-02-08 13:22 · Score: 5, Funny

This is virtualization... Imagine someone Imagining a beowulf cluster of those!
-Matt

--
--- Need web hosting?
1. Re:Other way by Hurricane78 · 2010-02-08 16:47 · Score: 1
  
  main = print ("Imagine" ++ si ++ " a beowulf cluster" ++ obc ++ " of those.")
  si = " someone imagining" ++ si
  obc = " of beowulf clusters" ++ obc
  
  --
  Any sufficiently advanced intelligence is indistinguishable from stupidity.
so they are 'only' wasting 200 machines by Anonymous Coward · 2010-02-08 13:23 · Score: 1, Insightful

5% may not sound like mubh, cut with 4096 nodes that's over 200 nodes that they are wasting.
1. Re:so they are 'only' wasting 200 machines by Barny · 2010-02-08 13:32 · Score: 3, Interesting
  
  Well, not sure how good they are now, but back when I studied at Uni we examined a few super-computer clusters and the rule of thumb in most cases was 1 CPU core per node was stuck doing IO for that node anyway, this was all before the move to Hypertransport with AMD though, so it may be much different for them now.
  The fact was, it was a number that was constant, it wouldn't get worse with more nodes, it was always x nodes lost per y nodes, as this is. Just add more nodes :)
  A worse problem would be if it was x^2 nodes per y nodes, then you're just throwing away money adding more.
  
  --
  ...
  /me sighs
2. Re:so they are 'only' wasting 200 machines by dbIII · 2010-02-08 16:27 · Score: 1
  
  It depends if the job is cpu bound or I/O bound.
  My skepticism comes from overhead being "only" 5% is likely to be "only" an extra eight hours for a week long job to run. With CPU bound stuff you want to be as close to the metal as you can get and still have the stuff run.
3. Re:so they are 'only' wasting 200 machines by Barny · 2010-02-09 17:13 · Score: 1
  
  Yeah, but if its IO bound, it should probably be re-written :)
  
  --
  ...
  /me sighs
Why? by Darkness404 · 2010-02-08 13:25 · Score: 1

What is the point of virtualizing a supercomputer? A 5% performance loss is a pretty big loss, in say a cluster of 100 computers, 5 of them would be wasted translating to thousands of dollars lost with little to show for it.

--
Taxation is legalized theft, no more, no less.
1. Re:Why? by Anonymous Coward · 2010-02-08 13:27 · Score: 0
  
  Because you don't have to spend weeks adapting code specific to that machine. Use the same program you run at home in 1/10000 the time.
2. Re:Why? by Anonymous Coward · 2010-02-08 13:28 · Score: 1, Interesting
  
  Perhaps those 5 nodes only cost 50k.
  How much would it cost to rewrite your one of a kind software and retest and verify it? There are other costs here that they are not letting us in on.
3. Re:Why? by Anonymous Coward · 2010-02-08 13:33 · Score: 0
  
  You are aware that these supercomputers are running Linux? If you can already run an app on Linux and you are able to compile a static binary you should be able to run it. So, answer again. Why?
4. Re:Why? by Spazed · 2010-02-08 13:34 · Score: 4, Interesting
  
  Most of them would be running an application done in C/C++ or some other low level language with threading. The whole advantage of super computers isn't that they have an absurd ghz rating, but an insane amount of cores. This could be useful for testing how a network of desktop computers would work, which it sounds like from the summary they are doing.
  
  TL:DR; Normal desktop software doesn't run faster on a super computer than on your 4 year old laptop.
5. Re:Why? by John+Hasler · 2010-02-08 13:55 · Score: 4, Insightful
  
  > What is the point of virtualizing a supercomputer?
  They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
  
  --
  Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
6. Re:Why? by Darkness404 · 2010-02-08 14:03 · Score: 1
  
  Not much if you run the program with an existing OS such as Linux. As for testing and verifying, I'd imagine for larger supercomputers it would be less and less of an issue while the 5% becomes more and more of an issue.
  
  --
  Taxation is legalized theft, no more, no less.
7. Re:Why? by Anpheus · 2010-02-08 14:27 · Score: 1
  
  I have to admit to, ahem, "loling" at your response. I know open source has the benefit of driving down costs, but adapting your software from commodity hardware to enterprise hardware, and, to go even further and run it on esoteric and specialized hardware is expensive. Whether it's proprietary or not. In fact, it might even be cheaper to get a vendor to rewrite their proprietary code because they've got teams of devs that already know the software in and out. Paying an outside team to write an existing application is always cost prohibitive.
  If they can make a supercomputer appear to be a huge cluster of commodity machines, that's pretty big. It's big because it enables that easy scale-up from commodity to esoteric hardware.
  Who knows, if it works well enough we might see Google change their minds and deploy a supercomputer because of the higher bandwidth interconnects than commodity hardware currently supports. The reason no one runs line of business on a supercomputer is because they're very nearly one-off deals. At least with a mainframe you know IBM (or whoever) will allow you to keep writing them checks to maintain and provide an upgrade path. Supercomputers are far more rarely upgraded, I think they typically run until they're obsolete.
8. Re:Why? by the+linux+geek · 2010-02-08 14:57 · Score: 1
  
  It would be far more likely to be FORTRAN than a C derivative. Also, plenty of supercomputers, especially IBM pSeries based ones, do have very high clock speeds (4-5GHz) and a relatively small number of cores; recent Nehalem systems follow the same trend.
9. Re:Why? by Anonymous Coward · 2010-02-08 15:08 · Score: 0
  
  Hey, finally a way to get around the N connections per browser and test your website.
10. Re:Why? by mhajicek · 2010-02-08 15:31 · Score: 1
  
  Plus they could simulate a system of multiple computers communicating and analyze the behavior of the system as a whole.
11. Re:Why? by afidel · 2010-02-08 15:42 · Score: 1
  
  Uh, this was run on ASCI Red, a 38,400 core Opteron based system with each node having a dedicated communication processor attached to a 3D torus for flat 1:1 communications.
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
12. Re:Why? by afidel · 2010-02-08 15:48 · Score: 1
  
  ASCI Red was upgraded twice for a performance increase of 685%-564% depending on if you want to talk Peak or usable.
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
13. Re:Why? by PopeRatzo · 2010-02-08 15:58 · Score: 1
  
  What is the point of virtualizing a supercomputer?
  So that if the supercomputer crashes, it won't bring down uTorrent running in the background and mess up their seeding of Animal Collective's Merriweather Post Pavilion.
  Why do you think?
  
  --
  You are welcome on my lawn.
14. Re:Why? by PopeRatzo · 2010-02-08 16:02 · Score: 1
  
  There are other costs here that they are not letting us in on.
  Pizza and 2-liter bottles of Nos, for example.
  
  --
  You are welcome on my lawn.
15. Re:Why? by joib · 2010-02-08 19:44 · Score: 1
  
  Actually, no. ASCI Red was retired from service in 2005.
16. Re:Why? by JBird · 2010-02-08 23:46 · Score: 1
  
  They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
  Sounds like the supercomputer in Greg Egan's short story Luminous. It was basically built from light and was reconfigured specifically for each different application.
17. Re:Why? by Anonymous Coward · 2010-02-09 01:15 · Score: 0
  
  I have always thought virtualization would be good in that I could deploy the packages I needed on the fly, turn-key, as the job requires. Every job requires a specific set of libraries and parameters in many cases. If the underlying interconnects are dealt with at a base level, all I need to do is send a config out to the cluster that matches the job I want to run to as many nodes as I need it to run on. Also, many times a super computer is not utilizing all resources for one job, you have dozens of jobs runnign at the same time, maybe each with a different set of requirements. In fact i could see performance improvements in the jobs because the jobs dictate the os and infrastructure and not the other way around.
  Just my two cents.
18. Re:Why? by Anpheus · 2010-02-09 01:54 · Score: 1
  
  And that's a relatively isolated example. Most of the entries on the top 100 supercomputers today will not be there in five years or ten years. They will probably not even be on the top 500 list at all within ten fifteen.
  No one wants to run their business apps on such volatile hardware. For scientists doing one-off simulations, one-off hardware is fine.
19. Re:Why? by afidel · 2010-02-09 02:10 · Score: 1
  
  Sorry, Red Storm, my duh.
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
20. Re:Why? by Nite_Hawk · 2010-02-09 02:58 · Score: 1
  
  I work for a supercomputer institute and am our resident grid/cloud junky. One of the reasons you might want to do this is to allow researchers to create virtual supercomputers on the supercomputer via advanced reservations for simulation runs. There's a variety of reasons that this can be useful. Some times software doesn't play nicely with other software on the system or requires specific versions of libraries (or even specific OSes). You may also want to test in an environment where you have control over the (virtualized) mpi stack so you can see how screwing around with it changes how your job runs. Having amazon EC2 compatibility on traditional clusters would be interesting as well.
  Anyway, if you are interested in more, here's the globus (teragrid, open science grid, etc) project's entry into this arena:
  http://www.nimbusproject.org/
21. Re:Why? by bridges · 2010-02-09 03:13 · Score: 1
  
  ACSI Red Storm normally runs a dedicated lightweight kernel called Catamount, not Linux. Similarly, the IBM BlueGene systems run the IBM compute node kernel, not Linux. Linux is used on some supercomputers, even some of the biggest ones (e.g. ORNL's Jaguar system) but the performance penalty of using Linux as opposed to a lightweigher kernel for some applications can be substantial(e.g. > 10%).
22. Re:Why? by Anonymous Coward · 2010-02-09 07:05 · Score: 0
  
  [I'm the AC you replied to]
  Yeah, there are some microkernels, but so what. Red storm may be catamount/qk, but the rest of the XTs out there are pretty much Linux. And even so, the compatibility between the apps running on the microkernels (be it Sandia's, IBMs, or another) and Linux are fairly decent. BSD style posix/libc stuff is there, and so a static binary will take you pretty far. The types of applications that are run on these machines were meant to be run in these environments, so I'm still very confused about the reason for virtualization. And there you go, pointing out that these microkernels can sometimes perform better than Linux on the compute nodes, but that just makes things more confusing. If they are concerned about the small performance gains, then surely they won't be virtualizing. So, what is the point virtualization in these environments?
23. Re:Why? by bridges · 2010-02-09 09:57 · Score: 1
  
  Palacios lives inside the lightweight kernel host. Applications that want to run natively on the lightweight kernel without virtualization can at *no* penalty. Applications that are willing to pay the performance penalty of Linux can run Linux as a guest at a nominal additional virtualization cost. That way, applications that demand peak hardware performance get it, applications that need more complex OS services get it, and the downtimes associated with a complete system reboot are avoided.
  In addiiton, the costs of something like Linux to a scientific application can be much higher for than many might expect. Cray's target was to get application performance on their Compute Node Linux within 10% of Catamount performance; they did so for most (but not all) of their apps as I understand it, but had to spend a significant effort to even get within 10%.
  We're happy to leverage their hard work, however, so that users who want CNL can boot it on top of our VMM, while users who don't can get done faster or save some of their allocated cycles. I sometimes wonder if ORNL wished they had been running a VMM/LWK on Jaguar when Roadrunner beat them on the SC 2008 Top 500 list by 0.5%. Being able to use the lightweight kernel for Top500 Linpack runs and CNL for running apps that needed it might have come in handy for them then. :)
  Finally, our experience has been that a small, simple, open-source LWK/VMM combination is a very powerful platform for OS and hardware HPC research - it provides a simple, understandable, and powerful base for addressing HPC systems problems (e.g. fault tolerance) without the complexity of trying to do that in, for example, Linux.
24. Re:Why? by bridges · 2010-02-09 10:00 · Score: 1
  
  Doh, my mistake, Roadrunner beat Jaguar by a little less than 5% in the SC08 Top500 list, not 0.5%. Still, I do wonder. :)
25. Re:Why? by LeadSongDog · 2010-02-09 12:11 · Score: 1
  
  They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.
  His parents let him set off nuclear weapons in their basement? Woaw!
  
  --
  Oh, I'm sorry sir, I thought you were referring to me, Mr. Wensleydale.
OSS ftw. by Asadullah+Ahmad · 2010-02-08 13:32 · Score: 2, Interesting

It is really pleasant to see more and more OSS projects which are being deployed at national level and large infrastructures.
Hopefully some less greedy company who benefit from such projects will start paying the volunteer developers. But then again, I have found that a lot of times if you are doing something as a hobby/interest/challenge, rather than because you were employed to do it, the outcome will be more refined and efficient. Though I have yet to experience the latter part first hand.
1. Re:OSS ftw. by Anonymous Coward · 2010-02-08 13:58 · Score: 0
  
  meh, it'll never take off
Hey, you're right! by John+Hasler · 2010-02-08 13:52 · Score: 1

Imagine a Beowulf cluster...

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
1. Re:Hey, you're right! by __aaclcg7560 · 2010-02-08 16:07 · Score: 1
  
  ... at $19.95. I'll take a couple of those. :P
not a good idea. by Anonymous Coward · 2010-02-08 14:14 · Score: 1, Interesting

Virtualizing a Supercomputer is never the correct solution. supercomputers have in their nature a system of managing lesser processes. that system could be extended rather than adding another virtual management system to run parallel to the existing management system burdened with maintaining it as another running process.
1. Re:not a good idea. by Anonymous Coward · 2010-02-08 16:44 · Score: 0
  
  I work in HPC and I agree with the anonymous parent. I don't get what these guys are doing. Even after skimming their docs I can't figure it out. None of the arguments made make much sense. They just don't present useful advantages, especially considering the owners of these machines and the types of applications they run. Do you know what the advantages are, or are you just blindly agreeing with the huge DOE lab?
2. Re:not a good idea. by bridges · 2010-02-09 04:54 · Score: 2, Informative
  
  Virtualization offers a number of potential advantages. A paper we have had accepted to IPDPS 2010 that enumerates more of them, but a few advantages quickly:
  1. The combination of a lightweight kernel and a virtualzation layer allows applications to choose which OS they run on and how much they pay in terms of performance for the OS services they needs. Because Palacios is hosted inside an existing lightweight kernel that presents minimal overhead to applications that run directly on it, applications that don't need the services (and overheads) of full-featured OS like Linux can run directly on the LWK/VMM with minimal overhead. On the other hand, apps or app frameworks that need higher-level OS services (e.g. shared libraries) can run the OS they need as a virtualized guest on top of the LWK/VMM. Because doing an actual kernel reboot on a machine like Red Storm is very time-consuming, (compared to a guest OS boot), this is a substantial advantage.
  2. Mean-time-to-interrupt on some of the most recent large-scale systems is much less than a single day, and virtualization is potentially useful technique for addressing fault tolerance and resilience issues in HPC systems, assuming that its overhead at scale can be kept small.
  3. A small open-source LWK/VMM combination enables a wide range of OS and hardware research on HPC systems both by being a small, understandable, low-overhead platform, and by providing a way to support existing HPC OSes and applications while enabling OS and hardware innovation.
  4. A number of others I won't mention right now as they're being actively researched here at UNM, and by my colleagues at Northwestern and Sandia. ;)
Let me get this straight.... by hesaigo999ca · 2010-02-09 01:56 · Score: 1

The way virtualization works is it is a virtual layer spread across many nodes to avoid any down time when you get
one node that fails, the rest pick up the slack, and without having to stop the running systems. This is using linux architecture to
cluster many computers on the bottom layer, so as to have the look of one mega computer, when it actually is 100 computers or more...etc...
Then we get into supercomputing, which again uses clusters and usually uses linux, to be able to make all the computers act as if it was one big computer, giving the advantage of multi-processors to be able to calculate much faster common operations, etc....
Now combining the 2, we could ....what is the advantage again, of putting a cluster on top of a cluster, I need to understand, because I don't see it, either one of these are used to make a supercomputer per se, but one is virtual, the other is physical....
either case, the advantage is the same from both, but merging the 2, would have too much of a slow down if you ask me, with all the
backend needing to monitor the other backend to load balance , raid, etc.... it just seems like it is a test to see if you can do it, but would you get any real advantage out of it, I am not so sure....someone with knowledge of vms, and supercomputers , please enlighten me.
The untold story by Anonymous Coward · 2010-02-09 01:59 · Score: 0

If you look up their research paper, you will quickly find that important performance issues remain in the area of high performance communication.
Typically this is where supercomputers should excell at, e.g., with a point-to-point latency down to a microsecond,
medium-size message throughputs of tens of Gigabits per second, and really low overheads. You get what you pay for.
However, when you look up this aspect in the paper, they mention a 5 to 11 microsecond absolute overhead (not mentioning the relative one!)
and the graphs showing actual bandwidth measurement comparisons are suddenly log-based..
Agreed, virtualizing high performance communication is a difficult issue. No need trying to hide it this way.
1. Re:The untold story by bridges · 2010-02-09 04:19 · Score: 1
  
  We're not trying to hide anything, and so I will admit to being surprised by this (anonymous) accusation. To address the anonymous coward's concerns, however:
  1. Actual users of supercomputers care most about application run time because applications are what scientists run, not micro-benchmarks. As a result, our paper and research more generally focuses on the runtime penalty to real applications (e.g. Sandia's CTH code) as opposed to focusing on optimizing micro-benchmarks that aren't what real users of these systems care about.
  2. Micro-benchmarks do provide useful information about the exact costs of various low-level operations, however, to the extent that they can show you what is causing the application slowdowns you do see. They also can potentially help understand how proposed changes might impact applications other than the ones we were able to run in our limited access to the production Red Storm system. Because of this, the paper the anonymous coward above refers to explicitly measures and presents micro-benchmark latency and bandwidth overheads. Specifically, it cites the latency cost on both Red Storm's SeaStar NIC (5 or 11 microseconds, depending on how you virtualize paging) and QDR Infiniband (0.01 microseconds). It also presents a bandwidth curve to fully characterize virtualization's cost over the full range of potential message sizes on SeaStar. (IB is less expensive to virtualize than SeaStar, because IB doesn't have interrupts that Palacios must virtualize on the messaging fast path where as SeaStar does, at least when running Cray's production firmware).
  We're very up front about the costs of virtualization because we are well aware that there is no such thing as a free lunch. Virtualization provides a number of potential advantages in supercomputing systems, for example in terms of dealing with node failures, providing a small open-source platform for OS research and innovation on supercomputing systems, handling applications with different OS feature and performance requirements, and a variety of other things. However, it does come with a cost to applications and application scientists that has to be weighed against its potential benefits.
Development is NOT open source, runs on VMware by Anonymous Coward · 2010-02-09 19:55 · Score: 0

Yeah, open source Palacios development would indeed be FTW, if it existed, but it doesn't.
While the Palacios code itself is open, the development image runs under VMware, which is closed tighter than a tight thing.
If you're looking for an open source development platform for VMMs, this isn't it.
1. Re:Development is NOT open source, runs on VMware by bridges · 2010-02-10 02:15 · Score: 1
  
  Palacios can run on real x86 hardware or on QEMU. In fact, most of our development is done on QEMU, which is open source. The VMWare image was something we did on the original 1.0 release just to help people get started running it and haven't done since, but VMware has *never* been required for development.