First Look At VMware's vSphere "Cloud OS"
snydeq writes "InfoWorld's Paul Venezia takes VMware's purported 'cloud OS,' vSphere 4, for a test drive. The bottom line: 'VMware vSphere 4.0 touches on almost every aspect of managing a virtual infrastructure, from ESX host provisioning to virtual network management to backup and recovery of virtual machines. Time will tell whether these features are as solid as they need to be in this release, but their presence is a substantial step forward for virtual environments.' Among the features Venezia finds particularly worthwhile is vSphere's Fault Tolerance: 'In a nutshell, this allows you to run the same VM in tandem across two hardware nodes, but with only one instance actually visible to the network. You can think of it as OS-agnostic clustering. Should a hardware failure take out the primary instance, the secondary instance will assume normal operations instantly, without requiring a VMotion.'"
With no delay at all? Somehow I don't believe it - there is always delay, but I wonder if it is "significant" enough to be noticed by an end-user.
The only operating system needed for navigating the clouds is piloted by bears that care ... a lot.
Motorcycles, Robots, Space Gossip and More!
FT only supports a single vCPU from my understanding... Not too many people running single CPU VM's, at least in my experience...
Check out the Kemari and Remus projects, which allow precisely the same in Xen environments. In essence, it's a continual live migration (vmware people, think continual vmotion) that resumes virtual machine execution on the backup node if the origin node dies. Very cool tech. The demonstration involved pulling the plug on one of the nodes. For more information just search, there are code and papers and presentation slides galore.
I had an idea at some point of a distributed app, similar to SETI@Home, that people would run on their computer. These computers would form a cloud which would support creating VMs that could run arbitrary code. If one app is currently running your code, and the computer it's on goes down, your code would continue to run on another one. If everyone runs it, it would be a huge pool of computational power. Then you could run crazy things on it. Then, profit! Anyway, is this a step in that direction?
Obviously.
Call me when VMware starts giving blowjobs.
How many hardware failures are actually characterized by a complete 100% loss of communication (as you'd get by pulling the plug)?
Don't CPU and Memory failures tend to make the computer somewhat unstable before completely bringing it down? How would vSphere handle (or even notice) that?
Even hard disk failures can take a small amount of time before the OS notices anything is awry (although you're an idiot if you care enough about redundancy to worry about this sort of thing, but don't have your disks in a RAID array)
-- If you try to fail and succeed, which have you done? - Uli's moose
One of the statistics measured by virtualcenter is the lag you're asking about.
The first hit on google images should give you a good idea.
In practice, I don't know... I imagine that the secondary instance will still receive network traffic bound for the cluster, so it'd probably be perceived as a hiccup when the primary one goes down, which is good enough for most services.
Boot Windows, Linux, and ESX over the network for free.
Like everyone else pointed out, it's a VM in lockstep with a 'shadow' VM. This is not just 'continuous VMotion'.
If something happens to the VM, the shadow VM goes live instantly (you don't notice a thing if you're doing something on the VM).
Right after that, the system starts bringing up another shadow VM on another host to regain full FT protection.
This can be network intensive, depending on the VM load, and currently only works with 1 vCPU per VM. Think 1-2 FT VMs per ESX host + shadow VMs.
You'll need recent CPUs that support FT and have an VMware HA / DRS Cluster set up.
So if you've got it, use it wisely. It's very cool.
-- Robi
The article mentions an inability (for the "pre-released" version) to PXE boot. If he's talking about booting for installation, then he's 100% wrong. The ESX beta/RC (build 140815) will, indeed, boot and install over a network. It's different from 3.5 so you'll have to adjust your commandline and/or kickstart. They use "weasel" instead of "anaconda" and that runs inside the service console. Short answer... "method=" becomes "url=" -- with a properly formated URL, eg. url=nfs://server/path/to/esx/. It's a much larger boot enviroment -- 80MB -- so it takes longer to boot, and from my half dozen installations (I'm only testing on 2 machines), it takes substantially longer to install 4.0 than 3.5. (my 3.5 installs take 2.5mins.)
Why is the latter half of this decade dominated by virtualisation tech which either amounts to the kind of monitor you'd see on IBM boxes in the '60s or is as advanced as "get both machines to do the same things at once / get one to copy its memory continuously to the other and then change ARP entries"? I mean, for fuck's sake, this is hardly innovation.
Maybe I'm at the stage in my life where I shout at the kids to get off my lawn, but I've enjoyed various clustering solutions since I first started working with VMS in the mid '80s. I've not seen any new concepts in the past 15 years, but I've seen old ideas receive way more market coverage.
Anyone who doubts it, spend a couple of weekends coding up a Linux module client/server combination: the server journals disk sectors and memory pages as they are written to (latter is trivial with dirty bits on modern hardware), daemon regularly takes snapshot including CPU state and copies to registered client machines. It's not damn rocket science.
Next logical step in the VMware product line which still does not take advantage of hardware acceleration for virtualization. YAWN.
I just can't see, say, a large financial corporation using this for anything but serving web pages.
So what they're saying is that they don't believe in operating systems, but acknowledge that they might exist?
Think of how stupid the average person is, and realize half of them are stupider than that.
"You can think of it as OS-agnostic clustering".
You mean it's not sure if there *is* an OS?
I'm on the FT team at VMware and just wanted to provide some additional information on FT requirements. You can also find out more about FT at: http://www.vmware.com/products/fault-tolerance/
VMware collaborated with AMD and Intel in providing an efficient VMware Fault Tolerance (FT) capability on modern x86 processors. The collaboration required changes in both the performance counter architecture and virtualization hardware assists from processor vendors. These changes could only be included in recent processors from both vendors: 3rd-Generation AMD Opteron(tm) based on the AMD Barcelona, Budapest and Shanghai processor families; and Intel® Xeon® processors based on the Penryn and Nehalem microarchitectures and their successors.
The current set of VMware FT supported processors are:
Intel® Xeon® 3100 Series, Wolfdale (UP)
Intel® Xeon® 3300 Series, Yorkfield
Intel® Xeon® 5200 Series, Wolfdale (DP)
Intel® Xeon® 5400 Series, Harpertown
Intel® Xeon® 7400 Series, Dunnington
Intel® Xeon® 5500 Series, Nehalem
AMD Opteron(tm) 1300 Series, Budapest
AMD Opteron(tm) 2300 Series, Barcelona (65nm, DP) and Shanghai (45nm, DP)
AMD Opteron(tm) 8300 Series, Barcelona (65nm, MP) and Shanghai (45nm, MP)
VMware maintains a KnowledgeBase (KB) article that provides a current list of supported processors, see http://kb.vmware.com/kb/1008027.
You can download a utility, VMware SiteSurvey at http://www.vmware.com/download/shared_utilities.html to check if your configuration can run VMware FT.
While we understand the end user's desire to be able to use VMware FT on as many processors as possible, VMware's goal is to guarantee that Fault Tolerance works reliably.