Hardware Virtualization Slower Than Software?
Jim Buzbee writes "Those you keeping up with the latest virtualization techniques being offered by both Intel and AMD will be interested in a new white paper by VMWare that comes to the surprising conclusion that hardware-assisted x86 virtualization oftentimes fails to outperform software-assisted virtualization. My reading of the paper says that this counterintuitive result is often due to the fact that hardware-assisted virtualization relies on expensive traps to catch privileged instructions while software-assisted virtualization uses inexpensive software substitutions. One example given is compilation of a Linux kernel under a virtualized Linux OS. Native wall-clock time: 265 seconds. Software-assisted virtualization: 393 seconds. Hardware-assisted virtualization: 484 seconds. Ouch. It sounds to me like a hybrid approach may be the best answer to the virtualization problem.
"
See title... VMWare make software virtualisation products. Of course they're going to try and find that software methods are better.
I drink to make other people interesting!
I'd like to think of VMware in a different mould than MS, but i'd still hate to take this info in w/o some third party verification.
The real question is what type of test was performed... It would make sense that different applications would function differently in a variety contexts. How about some variance? I dig VMWare, but come on...
I'm not fat, just big boned...
* - I imagine in real life it's not a 1:1 ratio, but for the sake of argument, work with me.
Clones are people two.
The correct conclusion is not that virtualization is better done entirely in software, but that current hardware assists to virtualization are badly designed. As the complete article points out, the hardware features need to be designed to support the software - not in isolation.
It reminds me of an influential paper in the RISC/CISC debate, about 20 years ago. Somebody wrote a C compiler for the VAX that output only a RISC-like subset of the VAX instruction set. The generated code ran faster than the output of the standard VAX compiler, which used the whole (CISC) VAX instruction set. The naive conclusion was that complex instructions are useless. The correct conclusion was that the original VAX compiler was a pile of manure.
The similarity of the two situations is that it's a mistake to draw a general conclusion about the relative merits of two technologies, based on just one example of each. You have to consider the quality of the implementations - how the technology has been used.
I suppose there are certain things hardware virtualisation does better.
The trick is, I'd guess, to find out which works better in which circumstances.
You see that people suspect this white paper because of its origin; they are right in doing so at least because only one type of test has been performed; surely not all computing tasks perform the same way as a kernel compile.
This suggests that VMWare have found the example which supports their claims the best; the question is, of course, whether this is the only such example.
So if we suppose that there are certain types of problems where hardware virtualisation outperforms software virtualisation, hybrid solutions seem to be the right way to go.
P.S. I don't really know what I'm talking about...
Ignore this signature. By order.
When are people going to figure out that "hardware solutions" are really software running on hardware, just like any other solution?
Sure, the instructions may be hardcoded, coming out of ROM, or whatever, but in the end its instrructions that tell the hardware what to do. And those instructions are called "software", no matter how the vendor tries to spin it. And if the solutions performs badly, it is because the software is designed badly. Period.
You aren't remembered for doing what is expected of you
It sounds to me like a hybrid approach may be the best answer
As so many times and so many cases before has it proven to be the optimal solution. What gives ? Good is that we have all these alternatives, and every vm company will try to evaluate, then optimize, which will lead to better performing software VMs, and because hw is slower to catch up, probably software VMs will be better for a while.
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
It probably won't work that way at all. This could be more of an additive thing.
For example, say you have a boat powered by 393 horsepower engine and a 484 horsepower engine. If you run them both at the same time, the net power is not going to be 439hp.
Software+hardware won't add in nearly the same way, but I wouldn't be surprised if a hybrid approach was %50 faster than either method alone.
Hardware virtualization may be slower right now, but both the hardware and the software supporting it are new. Give it a few iterations and it will be equal to software virtualization.
It may or may not be faster eventually, but that doesn't matter. What matters is that small changes in the hardware make it possible to stop having to depend on costly, proprietary, and complex software--like that sold by VMware.
Compare the around 2% impact of running an OS built to be virtualization-friendly, like with Linux + Xen, to that of software/hardware solutions to virtualize unfriendly OSes. Massive difference. So, it makes sense to migrate whatever services you're running on Windows to Linux before moving to a virtualized deployment, as you'll save a bundle.
If they can pick the *best* features of hardware, and the *best* features of hardware VT, then it is possible to create something that is faster than both solution on their own.
There's more to it than just the speed. What the virtualization requires from the OS being virtualized, security features reached, simpleness of virtualization engine (simple code == statistically less bugs) and so on. You also have to keep in mind that this is just the FIRST generation of hardware virtualization (on present x86 platforms). It will develop, and a lot. VmWare's virtualization software has been developed and polished for years so the results are not surprising at all.
g
Ultimately it's software's fault. Hardware is the bedrock without which the software is irrelavant. The biggest potential gains will always come from hardware since the software obtains it's potential from hardware. The software gains may seem the most impressive but they will always be based on hardware. For the end user the biggest gains in the end will come from improving hardware since this will determine what is even possible in software.
What matters is that small changes in the hardware make it possible to stop having to depend on costly, proprietary, and complex software--like that sold by VMware.
I am 100% in favor of cheap and open solutions. But I don't agree that this will soon be the case for virtualization. VMWare and the few other major vendors do a lot more than software virtualization of a CPU (which is all TFA was talking about). To have a complete virtualization solution, you need to also virtualize the rest of the hardware: storage, graphics, input/output, etc. In particular graphics is a serious issue (attaining hardware acceleration in a virtual environment safely), which from last I heard VMWare were working hard on.
Furthermore, Virtualization complements well with software that can migrate VMs (based on load or failure), and so forth. So, even if hardware CPU virtualization is to be desired - I agree with you on that - that won't suddenly make virtualization as a whole a simple task.
if vmware/qemu/Bochs/virtual pc, isnt good enough for the speed you need, just dual boot the OS's or if you want many servers on one machine, both IIS and Apache offer virtual hosts...
portfolio
In the end, the software instructions are actually executed on hardware, and that hardware imposes limits on what they do. In the case of virtualization the problem comes with privlidge levels. Intel processors see 4 levels of privlidge called Ring 0-3, of which two are used by nearly all OSes, 0 and 3. The kernel and associated code runs in Ring 0, everything else in Ring 3. Now the effect of what ring you are in controls what instructions the processor will allow you to execute, and what memory you can access. So if software in Ring 3 tries to execute a certian instruction, the processor will just not do it, it'll generate a fault.
Virtulization software has to deal with this, when the computer it's virtualizing wants to execute such an instruction, it can't just hand it off to the processor, it has to deal with it itself, it has to translate it to instrucitons that can be executed and virtualize what happens, hence the name vitrualization.
The idea with hardware support like VT is that the processor itself will take a more active hand. Virtual machines will actually be able to execute Ring 0 instructions on the processor, because they won't really be running in the main Ring 0, it'll create a seperate isolated privlidge space for it.
A more simple analogy would be to think of basic math. Suppose you want to multiple two numbers and now suppose again that you have a processor that only has an add instruction. Well, you'd have to do the multiplication in software, as in you'd have to do an add loop. Now suppose that a new version of that processor adds a multiplication instruction, that actually commands a multiplication unit. Now you are doing it in hardware. It is not only less code, but faster because there's a dedicated unit for it.
It's not like companies just whack instruction on their CPUs for the fun of it, they command different parts of the hardware to do different things. SSE, 3DNow, etc don't just have the processor run little add or multiply loops, they actually kick on seperate sections of hardware, designed for SIMD. Hence why they get the results they do.
IBM has been shipping virtualization since before many of these newcomers were even born. What do you think the 'V' in MVS or VM stands for? I wonder how well IBM's expired patents compare to modern virtualization. Of course in this case it helps to own the hardware, instruction set, and operating system.
The living have better things to do than to continue hating the dead.
Software virtualization is modifying the guest OS so it can run in a virtual machine, either before the fact with OS and version specific patches, or with run time translation. Both of these techniques can be problematic. Also it requires companies by VMWARE to hire experts in the kernels of all the OSes they plan to support. Contrast that with IBM's mainframe VM which didn't require much knowlege of its guest OS internals, just the hardware architecture.
Because if you actually RTFA it shows that the hardware virtualization is faster for some benchmarks (e.g. processing system calls) and slower for others (e.g. performing I/O requests or page-table modifications); if you combine the best features of each you should be able to get a virtual machine that is faster than both.
seems like such as waste of resources, why not just port the software over to the other OS/platform so it can run natively...
Politics is Treachery, Religion is Brainwashing
Perhaps the intended conclusion was that it was feasible to write an efficient compiler using only a small, intelligently chosen with compiler optimization in mind, subset of the instruction set. Perhaps the fact that the original compiler was (as you assert) "a pile of manure" was not unconnected to the fact that it tried to achieve speed by exploiting the entire, eclectic, VAX instruction set (wonder how they worked the famous polynomial instruction in?) instead of sticking to a subset and applying generalised optimization techniques.
PS: If you think RISC lost the war, then remember that modern x86 processors consist of a RISC core with a translator stage to handle all those pesky, legacy CISC instructions.
In a survey of 100 programmers, 111111 thought that duck-typing was a good idea.
The biggest WIN performance-wise VMWARE is getting is by stripping expensive TRAPS/FAULT and replacing them with appropriate non-faulting instructions, due the the software VMM's JIT compiling nature. It's the same feature that allows some Java code to whip pure C, because the VM is by it's nature dynamic and optimizes live for certain cases better that static analysis can do.
This type of win will not go away with better HW virtualization and offers VMWare at better claim at building a more secure virtual environment as they can logically peer into the code a strip dubious stuff right out.
They have the drivers for Xen for Windows XP and 2003. They just don't release them.
I don't doubt their numbers, they've been creating virtualized systems very effectively for years.
I think that any kind of "full virtualization" is going to be subject to these issues. If you want to see performance improvements then you should modify the guest os.
VMware's BT approach is very effective and their emulated hardware and bios are efficient, but that won't match the performance of a modified OS that KNOWS it's virtualized and cooperates with the hypervisor rather than getting 'faked out' by some emulation.
You really want to look at the other guest OSes, like MVS, and what VM did to manage performance on them. Things like various microcode vm assists and dedicating hardware to guests so no virtual to real hardware translation had to be performed.
A friend of mine works at Intel, and he flat out told me (several months ago) that Vanderpool/Pacifica will be slower than VMWare-only for the 1st generation. However this will change in a few years.
They did a lot more than a kernel compile. But I suppose I shouldn't expect people on slashdot to read the article anyway.
CAPTCHA: pitying, how appropriate.
It's not that people don't look to old mainframe solutions for things, they do, it's that often what was feasable on those wasn't on normal hardware, until receantly. There was no reason for chip makers to waste silicon on virtualization hardware on desktops until fairly receantly, there just wasn't a big desktop virtualization market. Computers are finally powerful to the point that it's worth doing.
It's no supprise that large, extremely expensive computers get technology before home computers do. You give me $20 million to build something with, I can make it do a lot. You give me $2000, it's going to have to be scaled way back, even with economies of scale.
You see the same thing with 3D graphics. Most, perhaps even all, the features that come to 3D cards were done on high end visualizaiton systems first. It's not that the 3D companies didn't think of them, it's that they couldn't do it. The orignal Voodoo card wasn't amazing in that it did 3D, it was much more limited than other thigns on the market. It was amazing in that it did it at a price you could afford for a home system. 3dfx would have loved to have a hardware T&L engine, AA features, procedural textures, etc, there just wasn't the silicon budget for it. It's only with more developments that this kind of thing has become feasable.
So I really doubt Intel didn't do something like VT because they thought IBM was wrong on the 360, I think rather they didn't do it because it wasn't feasable or marketable on desktop chips.
No, 'good' + 'bad but necessary' = better
Rirelobql xabjf gung EBG-13 vf gur yrnfg frpher rapelcgvba rire, ohg jbhyq lbh jnfgr lbhe gvzr npghnyyl qrpelcgvat vg???
Their measurements may be accurate. The question for me is.. what are they measuring? The slowest things about virtualisation for me are: a) swapping and memory use, because I tend to want LOTS of virtualisation, or none; b) peripheral hardware sharing issues, such as 3D video card acceleration; c) handling many users or workloads, so that each doesn't slow the other to a crawl.
If hardware solutions can do a better job of compressing the memory that's not in use (unlikely) or virtualising 3D video, so that many OSes can run in a window with mixed open source drivers and proprietary drivers, and perform well, then I'm interested. If it can stop users on a shared hosting machine from bothering each other or getting a terrible responsivenesss experience when they ssh in and run some graphical app remotely, then I'm interested in hardware solutions. Otherwise, it's same-old same-old I guess.
This won't be the first time software beats hardware.
The original Stacker product was a combination of a hardware card and software. Think of the hardware card as an accelerator for doing the comression/decompression.
The hardware was faster on the oldest machines, but on anything above a 286/12 (I had a 286/20 at the time), or almost any 386, it ran faster without the hardware card. And on every 486, the card was useless.
So, while you may want to "consider the source" of this news, this is only one factor to weigh. As time goes on, I'm sure we'll see more studies, benchmarks, etc.
Remember, there are 3 things that are inevitable in a programmers' life - death, taxes, and benchmarks.
I am planing the network for a large VoIP provider and have been looking at Xen and VMWare. I have tested our VoIP applications such as BroadSoft on Xen and ran into some big problmes such as duble page fault that kills the whole box. I have talked with VirtualIron and they have found the same issues with Xen and say they have patches for their implimentation, but their beta is full and I can't test for 30 days. I have tried to get a VMware sales guy to call me, left 2 messages and emailed twice with no responce, funney since they want about 4X the cost of VirtualIron.
Ok, I see all of this virtualization going on, but I keep thinking about a burning question...
The x86 iunstruction set and architecture was invented some time ago (286, Intel, 1982),
and although it has been added to and improved upon by both AMD and Intel, one has to wonder
if it is the correct platform for performing virtualization, or for virtualization in general.
I mean, ok, it is a slightly big deal to design a new CPU (and I know, I took all the Ph.D.
courses on CPU design), but think about it. We are trying to make a nice old (vintage? classic?)
CPU and instruction set good for virtualization. I think now is the time to step back and
say "hey, we can do better. Lets get a bunch of good CPU designers and _thinkers_ (call Google?)
to design an architecture that works well for virtualization, then port linux to it" Ok, we
can invite Tannenbaum too and port Minix. Maybe call plan9 too.
At any rate, the point is if we are going to really use virtualization, lets do it 100% and not
half-ass like we always tend to.
Oh, wait up, we should just call IBM. They have been virtualizing for years. Lets get them
to design us a good high speed cpu for that.
I just found out the hard way that Xen isn't quite ready to do hardware virtualization either. It does support the VT intruction set, but it doesn't handle disk IO well at all to the point where you can get up to 50% performance loss. They say that this will be eventually fixed but that doesn't change the fact that I spent time looking for the right hardware virtualization solution and it still doesn't perform. Software paravirtualization under Xen is probably still better than VMware though.
So don't be so quite to judge VMware's claims just because they are a for profit company with an insane EULA.
Apparently, yes, and by a good margin.
There are several documents and articles out there which point out VT's problems and how Pacifica is quite dramatically better. Here's an excerpt from "AMD Pacifica turns the nested tables", part 3 of an informative series of articles:
This should allow an otherwise identical VMM to do more things in hardware and have lower overhead than VT. AMD appears to have used the added capability wisely, giving them a faster and as far as memory goes, more secure virtualisation platform."
So, it looks like AMD are ahead on hardware virtualization at the moment.
If I read it correctly, this is because Intel's VT actually requires a lot of software intervention, so it's not actually a very strong hardware solution at all.
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
Well OK. But it could also mean that VMWare doesn't know yet how to properly create a hardware virtualized vm.
Parallels on OS X switches between software and hardware virtualization and using hardware virtualization its about 97% the speed all around of native hardware (consider that virtualization on current Yonah CPUs is equal to one core only). Software virt on Parallels is much slower - on par with running Windows Virtual PC on the same box using Windows XP (not Mac Virtual PC).
The instant that somebody starts to mention ROI or TCO, you can be certain that they have no actual facts on their side.
Hardware is hard to upgrade.
not even for demonstration purposes.
int isPrime(int a) {
for (int i = 2; i a; i++) {
if (a % i == 0) return 0;
}
return 1;
}
Comment removed based on user account deletion
Comment removed based on user account deletion
I designed one of the x86 h/w virtualization offerings. It's obvious that outside of device emulation, the biggest overhead of virtualization is the s/w emulation of what amounts to two levels of address translation (especially hairy in multiprocesor systems due to the brain-dead x86 page table semantics that do not require explicit invalidation). So clearly you want nested-paging support in h/w. However, that support is a little more complex than a few microcode changes to trap selected privileged instructions --- and due to schedule pressures, it didn't make it into the current release. Once that's in, expect h/w virtualization to speed up significantly.
Note that this doesn't make all the other stuff in VT/SVM useless; there are lots of places on the x86 where pure s/w virtualization has to go to great lengths of complexity just to get things correct. As a simple example: there's no way on "old" x86 h/w to save & restore segment descriptors (which you need to do on world switch) --- all you get is the selector, and if the guest O/S has overwritten the in-memory copy, you're out of luck. "Fixable" in s/w (obviously; VMWare does it), but just plain grody. So a major advantage of SVM/VT is that it becomes a lot *easier* to write a VMM (opening up the market to more players; this is starting to show in the Macintosh market) --- eventually, it should become faster, too.
On a separate note, over the next years, expect h/w assistance for dealing the device emulation (and not just from the CPU vendors).
Ok, I'm sure I'll get blasted for this, but here goes. So, you go out, spend your last dollar to buy a processor that goes 5% faster than the one priced 25% less, then you stick VMWare or Xen or whatever to take a 50% hit? I don't think so in my book. Another fascinating take is your wasting 25% of your electric power budget so you can virtualize the machine. Again, I see the reason you want to do this (yeah right run windoze) but I'll just keep running my linux boxes as pure linux. Ok, go, start berating.
I'll let you in on a secret: if you consider all costs, and return on investment, using VMware is a competitive advantage over using Xen.
Xen: free. Linux: free. I don't understand where I would spend any money turning 1 linux server into 2 linux servers ith Xen. We don't use Windows on anything but the domain controllers, but Xen doesn't windows (nor would I want to virtualize our DCs...)
Do you have any evidence or benchmarks to back that up? Did you read Keith's paper? If there are flaws in the reasoning and testing methodology, please point them out.
My understanding is that our performance data indicates that in apples-to-apples comparisons (same host OS on the same hardware), VMware-with-binary-translation is faster than Parallels-with-VT for normal workloads.
And note that I'm not in VMware's performance group, nor do I work on low-level virtualization stuff, so I'm not too familiar with the specific performance metrics. However, I'll add that logically it doesn't make sense for VMware to put its head in the sand by biasing any of its internal performance comparisons: if we were doing worse, we'd be quite upset about it and would be working really hard to correct it any such performance anomaly.
Most virtualization is for servers or Linux desktops, and they don't require more than virtual disks and networks, plus minimal console emulation; all that code already exists in open source form.
VMware's big thing was a JIT-like x86 engine, a complex piece of software that is now not needed anymore. That really is a big deal.
There's no reason that a system using hardware virtualization (which still requires a lot of software anyhow) can't employ the same sort of code modifications used by all-software virtualization. But the all-software approach must scan ALL code before it is executed, to find the trouble spots, while the hardware-software combo can simply wait for a trap, then modify the code that cause the trap.
Xenu can NOT be defeated!! What? Oh, okay, never mind.
A long time ago, with hardware far far from here, I remember playing games with my brand new 3D card, and wow the games looked nice! The only problem was that the games ran slower then they did in software!!! Lesson learned, don't buy first generation hardware and expect it to always be better/faster/cooler then old optimized software! Either that or don't by S3 Virge based cards! ;)
OSNEWS repeated the same story about virtualization, which I pointed out in the post only to be suspended from posting any more comments.
is lifted from Sec. 3.2 of the article http://www.vmware.com/pdf/asplos235_adams.pdf in the story, and yes, it is very inefficient... that is why is titled "How NOT to write an isPrime function"
Donnie, have you been sleepwalking again?
I was using Parallels on Linux on my Core Duo laptop and it was _fast_. I was very impressed. Then I tried vmware-server on the same machine and it ran just as fast if not faster. Later on I discovered vmware-server was actually not even using Intel's VT instructions, it was being done in software. Something to think about.
"But I'm still right here, giving blood and keeping faith. And I'm still right here."
Well, OS-level virtualization should have the best possible performance, leaving all the others behind. It's purely from the theory -- less overhead means better performance.
From the other side, speaking of testing practice, all tests are very error-prone. By tweaking some parameters here and there you can improve performance by, say, 10%, or make it fall behind. The bottom line is: do not ever trust the tests, do your own (and expect to spend a few months doing that).
-- Kir Kolyshkin, OpenVZ project leader.
You just have to read the paper and it spells it out. Other than VMWARE making a hybrid software/hardware mode it's not going to get any faster. Every time there is data recieved or sent there is an expensive context switch for hardware VM. So databases, web servers, and file servers will waste more cycles virtualized than compared to Seti or Folding programs. Clicking your heals together is not going to change the result. This is first generation virtualization results. How many SSE amd MMX revisions have there been?
The good news is now windows CAN be virtulized in hardware. The bad news is it's slow. This means that no one is going to go through the trouble of reinventing VMWARE's software virtualization for open source when a hardware VM is so much easier to create. VMWARE is still relevent if you want a little edge on performance and are willing to pay for it.
With unix easily being paravirtualized you have to wonder when MS is going to release a vitualized edition of windows of their own. It would probably only need a new new HAL (hardware abstraction layer) library.
Let's say you have a boat with a 393 or a 484 horsepower engine. When you install both of them, the boat will sink
i soooo don't care for virtualisation (unless it :P ), but for one promising
involves a girl on a bike
point: "clustering".
why can't virtualisation help me encode to mp4 faster
or compile faster when i have more then one computer in
my network? if it can virtualise that os what-not, why not
use it for suemthing usefull? can't it calculate across all
my "mostly" idle computers then?
"there goes another million dollar idea(tm)" *sigh*
I'm surprised VMWare would be saying this. Have you ever tried their products? Testing of their VMWare Server so far is very disappointing, especially with the networking performance. Their support forum is crowded with complaints about abysmal network performance, with no concrete response from VMWare or any solution.
I've got mine up to 1024x768 with the emulated Cirrus Login, and with the newer versions the VBE emulation should let you get a lot higher:
From the qemu Documentation
If you are using Windows XP as guest OS and if you want to use high resolution modes which the Cirrus Logic BIOS does not support (i.e. >= 1280x1024x16), then you should use the VESA VBE virtual graphic card (option `-std-vga').
here some tests conduct by myself
parallels workstation 2.1 build 1670 (has vt support)
vmware workstation 5.5.1 build 19175 (hasn't vt support)
starting vm
vmware: 5.38
parallels: 1.88
booting etch install iso
vmware: 14.89
parallels: 6.55
installing etch (not considering interaction)
vmware: 122.56
parallels: 179.18
booting installed sys
vmware: 23.46
parallels: 13.73
apt-get install vim build-essential (no download)
vmware: 33.78
parallels: 16.54
tar -jxf linux-2.6.17.9.tar.bz2
vmware: 55.34
parallels: 30.73
make bzImage
vmware: 356.28
parallels: 557.92
make modules
vmware: 2037.24
parallels: 3358.03
the guest os is kubuntu 6.06 with kernel 686 (smp support)
hardware is a centrino core duo with VT support 1gb of ram
512 on each virtual machine, all at 32bit.
ptrace(), do_brk(), do_mremap() or do_mremap()again? (maybe a new pool)
HW virtualisation is about running the unmodified OS on top of other operating system.
SW approach (paravirtualisation) is about modifying the guest (and somewhat host) OS to be simpler to virtualise it, get better performance by chopping out unnecessary operations on guest OS and optimise it to specifically use certain host API for even better performance gains.
Of course, HW virtualisation speed depends mostly on what hardware can do. AMD is ahead currently (and their boards will even support IOMMU for simpler PCI access control). It's certain that performance will improve with next-gen hardware, especially Intel. In the meantime it would make sense to support both methods for best performance - i.e. use hardware stuff where useful (e.g. for transparent page handling between OS's) and still modify guest OS not to do costly stuff which will be trapped and emulated in VMM (in fact maybe VMM will give performance improvement even with a single OS kernel).
In the long term, we can expect virtualisation to come to desktop - sharing of video, sound and input devices will allow people to simultaneously run linux and windows, somewhat lowering the barrier for linux adopters (i.e. users will be able to launch windows to use their killer app or game without rebooting).