Extreme Memory Oversubscription For VMs

← Back to Stories (view on slashdot.org)

Extreme Memory Oversubscription For VMs

Posted by Soulskill on Tuesday August 10, 2010 @03:55PM from the eXtreme-virtualization dept.

Laxitive writes "Virtualization systems currently have a pretty easy time oversubscribing CPUs (running lots of VMs on a few CPUs), but have had a very hard time oversubscribing memory. GridCentric, a virtualization startup, just posted on their blog a video demoing the creation of 16 one-gigabyte desktop VMs (running X) on a computer with just 5 gigs of RAM. The blog post includes a good explanation of how this is accomplished, along with a description of how it's different from the major approaches being used today (memory ballooning, VMWare's page sharing, etc.). Their method is based on a combination of lightweight VM cloning (sort of like fork() for VMs) and on-demand paging. Seems like the 'other half' of resource oversubscription for VMs might finally be here."

3 of 129 comments (clear)

Min score:

Reason:

Sort:

Re:Leaky Fawcet by Mr+Z · 2010-08-10 16:48 · Score: 4, Interesting

Sometimes that doesn't work out so well. If you have a fragmented heap with gaps between the leaked items that keep getting reused, it can lead to a lot of strange thrashing, since it effectively amplifies your working set size.
I think that may be one of the things that was happening to older Firefoxes (2.x when viewing gmail, in particular)... not only did it leak memory, it leaked memory in a way such that the leak couldn't just stay in swap.

--
Program Intellivision!
Re:Leaky Fawcet by sjames · 2010-08-10 18:33 · Score: 4, Interesting

I often see uptimes measured in years. It's not at all unusual for a server to need no driver updates for it's useful lifetime if you spec the hardware based on stable drivers being available. The software needs updates in that time, but not the drivers.
In other cases, some of the drivers may need an update, but if they're modules and not for something you can't take offline (such as the disk the root filesystem is on), it's no problem to update.
Note that I generally spec RAM so that zero swap is actually required if nothing leaks and no exceptional condition arises.
When disks come in 2TB sizes and server boards have 6 SAS ports on them, why should I sweat 8 GB?
Let's face it, if the swap space thrashes (yes, I know paging and swapping are distinct but it's still called swap space for hysterical raisins) it won't much matter if it is 1:1 or .5:1, performance will tank. However, it it's just leaked pages, it can be useful.
For other situations, it makes even more sense. For example, in HPC, if you have a long running job and then a short but high priority job comes up, you can SIGSTOP the long job and let it page out. Then when the short run is over, SIGCONT it again. Yes, you can add a file at that point, but it's nice if it's already there, especially if a scheduler might make the decision to stop a process on demand. Of course, on other clusters (depending on requirements) I've configured with no swap at all.
And since Linux can do crash dumps and can freeze into swap, it makes sense on laptops and desktops as well.
Finally, it's useful for cases where you have RAID for availability, but don't need SO much availability that a reboot for a disk failure is a problem. In that case, best preformance suggests 2 equal sized swaps on 2 drives. If one fails, you might need a reboot, but won't have to wait on a restore from backup and you'll still have enough swap.
Pick your poison, either way there exists a failure case.
And yes, in the old days I went with 2:1, but don't do that anymore because it really is excessive these days.
Re:Kernel shared memory by descubes · 2010-08-10 18:46 · Score: 5, Interesting

Having written VM software myself (HP Integrity VM), I find this fascinating. Congratulations for a very interesting approach.
That being said, I'm sort of curious how well that would work with any amount of I/O happening. If you have some DMA transfer in progress to one of the pages, you can't just snapshot the memory until the DMA completes, can you? Consider a disk transfer from a SAN. With high traffic, you may be talking about seconds, not milliseconds, no?

--
-- Did you try Tao3D? http://tao3d.sourceforge.net