First Look At VMware's vSphere "Cloud OS"
snydeq writes "InfoWorld's Paul Venezia takes VMware's purported 'cloud OS,' vSphere 4, for a test drive. The bottom line: 'VMware vSphere 4.0 touches on almost every aspect of managing a virtual infrastructure, from ESX host provisioning to virtual network management to backup and recovery of virtual machines. Time will tell whether these features are as solid as they need to be in this release, but their presence is a substantial step forward for virtual environments.' Among the features Venezia finds particularly worthwhile is vSphere's Fault Tolerance: 'In a nutshell, this allows you to run the same VM in tandem across two hardware nodes, but with only one instance actually visible to the network. You can think of it as OS-agnostic clustering. Should a hardware failure take out the primary instance, the secondary instance will assume normal operations instantly, without requiring a VMotion.'"
With no delay at all? Somehow I don't believe it - there is always delay, but I wonder if it is "significant" enough to be noticed by an end-user.
FT only supports a single vCPU from my understanding... Not too many people running single CPU VM's, at least in my experience...
You should be running single vCPU machines by default and only scale CPU's if absolutely necessary and if the app is fully SMP aware and functional.
http://virtualize.wordpress.com/
Check out the Kemari and Remus projects, which allow precisely the same in Xen environments. In essence, it's a continual live migration (vmware people, think continual vmotion) that resumes virtual machine execution on the backup node if the origin node dies. Very cool tech. The demonstration involved pulling the plug on one of the nodes. For more information just search, there are code and papers and presentation slides galore.
He didn't say to use a single vCPU for a non-SMP aware app, he said to use a single vCPU for all application loads. For SMP aware apps, adding another virtual CPU is a scaling option. If you have non-SMP aware apps, then you need to find another solution, like migrate to a host with faster cores.
It makes sense. If you have 32 workloads and 16 cores, don't add the overhead of making 64 virtual vCPUs, 32 will use the host resources more efficiently as long as one app doesn't need the power of more than one core. If it does, give it to the guest only when it needs it.
If your apps are not SMP aware, WTF at you using? Multiple CPUs has been the standard for servers for at least a decade in x86 gear.
Yes, but it doesn't work in VMs the same way, at least not in VMWare. On a loaded system you often find a single-vCPU VM will out perform one with more than one vCPU, in fact if you can spread your app over multiple machines you are generally better off running two single CPU VMs instead of one dual-CPU one. This is true no matter how many physical CPUs/cores you have available.
Why is this? Because a single CPU VM can be scheduled when-ever there are time-slices available on any physical CPU/core (though a good hypervisor will try not bounce VMs between cores too often, as this reduces the potential gains from using the core's L1 cache and (on architectures where L2 cache isn't shared) L2 cache too), but a dual vCPU VM will have to wait until the hypervisor can give it timeslices on two CPUs at the same time. If this is the only VM that is actively doing anything on the physical machine (and the host OS is otherwise quiet too) this makes little difference aside from a small overhead on top of the normal hits for not running on bare metal, but as soon as that VM is competing with other processes for CPU resource it can have a massive negative effect on scheduling latency.
How many hardware failures are actually characterized by a complete 100% loss of communication (as you'd get by pulling the plug)?
Don't CPU and Memory failures tend to make the computer somewhat unstable before completely bringing it down? How would vSphere handle (or even notice) that?
Even hard disk failures can take a small amount of time before the OS notices anything is awry (although you're an idiot if you care enough about redundancy to worry about this sort of thing, but don't have your disks in a RAID array)
-- If you try to fail and succeed, which have you done? - Uli's moose
One of the statistics measured by virtualcenter is the lag you're asking about.
The first hit on google images should give you a good idea.
In practice, I don't know... I imagine that the secondary instance will still receive network traffic bound for the cluster, so it'd probably be perceived as a hiccup when the primary one goes down, which is good enough for most services.
Boot Windows, Linux, and ESX over the network for free.
Like everyone else pointed out, it's a VM in lockstep with a 'shadow' VM. This is not just 'continuous VMotion'.
If something happens to the VM, the shadow VM goes live instantly (you don't notice a thing if you're doing something on the VM).
Right after that, the system starts bringing up another shadow VM on another host to regain full FT protection.
This can be network intensive, depending on the VM load, and currently only works with 1 vCPU per VM. Think 1-2 FT VMs per ESX host + shadow VMs.
You'll need recent CPUs that support FT and have an VMware HA / DRS Cluster set up.
So if you've got it, use it wisely. It's very cool.
-- Robi
, but a dual vCPU VM will have to wait until the hypervisor can give it timeslices on two CPUs at the same time.
Then their hypervisor is broken. It should be possible for A dual vCPU machine to have vCPU1 and vCPU2 be two timeslices on the same real cpu if need be.
Which would kill any benefit of running SMP in the VM anyway, if it were possible.
My understanding, which may be out of date, is that this is not considered a good idea as timing issues between threads on the two vCPUs if scheduled one after the other on the same core could potentially cause race conditions. And if not that serious, the threads on the vCPU that gets the first slice of the real core could be paused waiting for locks to be released by threads the guest OS has lined up to runon the other vCPU. This is an explanation that I have seen given as to why VMWare would not allow you to do virtual SMP on a single-core-single-CPU host machine (i.e. emulating SMP in the guest by giving two or more vCPUs alternating time on the only physical core).
Try supporting synchronization of two virtual machines over a network/SAN when you also have to deal with SMP. Gets hard.
“Common sense is not so common.” — Voltaire
Yea, but you also get the overhead of two schedulers (the host and the guest) and two systems moving thread context from core to core, which is an expensive operation. Most VMware systems are pretty heavily oversubsubscribed in terms of cores. Its not uncommon to have 60 guests on a 16 core host. If all 60 guests have 4 virtual CPUs, you do get the advantage that one guest can expand out to consume about a quarter of the total host CPU power, but you also have the cost of guest SMP switching even though you are using an average of 0.25 cores per guest.