First Look At VMware's vSphere "Cloud OS"

Instantly? by whereizben · 2009-05-22 09:19 · Score: 4, Insightful

With no delay at all? Somehow I don't believe it - there is always delay, but I wonder if it is "significant" enough to be noticed by an end-user.

Re:Instantly? by Inakizombie · 2009-05-22 09:23 · Score: 4, Funny

Sure its instant! There's just an item in the hardware requirements that states "Quantum processing required."
Re:Instantly? by BSAtHome · 2009-05-22 09:27 · Score: 2, Funny

Quantum processing,... hm, that means it will first happen when you look. That is definitely not a good idea. It should just work and I should not be required to walk down to the cellar to find the damn hardware box I am using. Next I will be required to locate the processor before Heisenberg is satisfied.
Re:Instantly? by Amouth · 2009-05-22 09:52 · Score: 1

yes it is instant - i'm not sure exactly how they are doing it at the moment but basicly the boxes work in tandom to sync move on a per cycle basis.. meaning not a single cpu cycle is lost..

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Re:Instantly? by ergo98 · 2009-05-22 09:58 · Score: 1

yes it is instant - i'm not sure exactly how they are doing it at the moment but basicly the boxes work in tandom to sync move on a per cycle basis.. meaning not a single cpu cycle is lost..
There probably are scenarios where there is a delay while it tries to figure out if indeed the other participant is down. However the OP's question -- instant -- was best responded to by the quantum processing response, because "instant" is in the mind of the assessor, and one woman's instant is another man's forever.
While this feature is heavily focused on, it has to be incredibly demanding on communication and computational resources. As you mentioned, it isn't your standard "do the macro-same thing" approach, but instead is literally by CPU cycle, so I have to imagine that it is only appropriate for a very small subset of problems.
Re:Instantly? by lightyear4 · 2009-05-22 10:12 · Score: 5, Informative

Instantly? Of course not. But the time required is equivalent to vmotion/live migration in bog-standard virtualization. How long? "That depends." To throw numbers at you, 30-100ms -- variance largely dependent upon how quickly your network infrastructure can react to MACs changing locations, whether in-flight TCP streams are broken as a result, etc. To help switches cope, people usually send a gratuitous ARP to jumpstart the process.
Re:Instantly? by Arthur+Grumbine · 2009-05-22 10:15 · Score: 1, Funny

and one woman's instant is another man's forever...
I see that you too have been forced to wait for "an instant" while your girlfriend/wife does some "quick" shopping.

--
Now that I think about it, I'm pretty sure everything I just said is completely wrong.
Re:Instantly? by Anonymous Coward · 2009-05-22 10:42 · Score: 1, Interesting

It actually is instant. I cannot elaborate as to how this works since I work at VMW, but there are videos of demos online, and I've seen it work. It's incredible. ---- these opinions are mine and not vmware's. i do not represent their opinions, etc....
Re:Instantly? by Anonymous Coward · 2009-05-22 11:03 · Score: 2, Insightful

Basically the two nodes will be in constant communication with each other. All data sent to the primary node is also sent to the secondary node, and the primary and secondary have a constant link between each other. Both nodes will perform the same computations on the data, but only the primary will reply to the user.
If the secondary node notices that the primary is not responding it will immediately send what the primary was supposed to have been sending back to the user.
Re:Instantly? by JSG · 2009-05-22 12:39 · Score: 1

Instant eh? Define instant in this context.
To be honest I have found vMotion to be pretty much "instant". Is this thing *more* instant and if so is it more instant enough to justify its existence over vMotion?
I present my opinions as a non AC - if you are afraid of speaking freely then you work for the wrong firm.
Re:Instantly? by ghetto2ivy · 2009-05-22 16:04 · Score: 1

So in a nutshell, its like raid-1 (mirroring) for your box? Interesting. Anyone know if an operation crashes the system, will it bsod both systems? or is it only physical/network faults that are tolerated?
Re:Instantly? by TheRealSlimShady · 2009-05-22 19:58 · Score: 1

Yep, if an operation takes out the system then the mirrored system will also die. They are in total lockstep so there is no protection of the application. It does give you near instant protection should the hardware or the hypervisor fails, but not if your app or OS should fail.
Re:Instantly? by Mista2 · 2009-05-22 21:55 · Score: 2, Informative

It keeps a running copy on the failover host, reading from the same storage as the active host. It's as if the server were about to complete VMotion without having just done the final step. outage time is a small hiccough, less than a second. Current running sessions just carry on. If its uploading a file to someone, it just carries on. The outage is well withing the tollerance of typical TCP sessions.
Re:Instantly? by Thumper_SVX · 2009-05-23 08:54 · Score: 2, Informative

It's close enough. I played with this feature at VMworld last year, and when running SQL transactions along with a ping, we dropped one packet but the SQL transactions didn't miss a beat.
It's impressive enough... the two systems are working in lockstep, such that even memory is duplicated between the two systems. It's an extension on the existing VMotion function in VMware today. However, bear in mind it has some limitations; only one CPU is possible at the moment and you still have the overhead of really two VM's running at once instead of just one. So it's not a solution for ALL of your environment, just part of it.
I'm sure the limitations will be eased over time as they tune the technology... but as a first attempt it IS awesome. Thing is, in my environment the stuff that is needed so critically that it can't take an hardware failure is usually >1 CPU, so this isn't a solution... but I guess if you have some relatively low-load but high-criticality servers then you could find a use for it (web servers seem like a good place to do this).
And the answer is no, I don't think the end user will ever notice so long as your network infrastructure is good enough. Certainly, my users never notice a VMotion event.

Re:FT by MartijnL · 2009-05-22 09:29 · Score: 5, Informative

FT only supports a single vCPU from my understanding... Not too many people running single CPU VM's, at least in my experience...

You should be running single vCPU machines by default and only scale CPU's if absolutely necessary and if the app is fully SMP aware and functional.

--
http://virtualize.wordpress.com/

Re:FT by h4rr4r · 2009-05-22 09:32 · Score: 1, Insightful

If your apps are not SMP aware, WTF at you using?
Multiple CPUs has been the standard for servers for at least a decade in x86 gear.

Xen did it first by lightyear4 · 2009-05-22 09:40 · Score: 2, Informative

Check out the Kemari and Remus projects, which allow precisely the same in Xen environments. In essence, it's a continual live migration (vmware people, think continual vmotion) that resumes virtual machine execution on the backup node if the origin node dies. Very cool tech. The demonstration involved pulling the plug on one of the nodes. For more information just search, there are code and papers and presentation slides galore.

Re:Xen did it first by Anonymous Coward · 2009-05-22 10:17 · Score: 1, Informative

VMware FT is not based on continuous memory snapshotting, it uses deterministic record/replay to do simultaneous record and replay. You could find a overview of this technology at http://www.vmware.com/products/fault-tolerance
Also VMware demonstrated a working prototype as early as in 2007
http://www.vmworld.com/community/conferences/2007/agenda/
w.r.t to Xen, doing a proof of concept is one thing but implementing it and supporting it in production quality with sufficient performance is another thing.
Re:Xen did it first by ACMENEWSLLC · 2009-05-22 10:19 · Score: 2, Informative

We have both vMotion and XEN.
vMotion is very noticeable. Some things fail when it happens. Zenworks 6.5 is an example.
With Xen, we setup a VNC mirror. EG the guest was VNC Viewing itself. We were moving a window around and then we moved the guest from Xen server 1 to 2 (we have iSCSI BTW.) There was a noticeable affect that lasted for less than a second, but then we were on XEN #2.
It's nice to see VMWare getting this feature right with vSphere.
Re:Xen did it first by lightyear4 · 2009-05-22 10:23 · Score: 1

Xen live migration does not involve 'continuous memory snapshotting' -- the referenced Kemari utilizes a combination of i/o triggers and observation of shadow page tables (nested page tables, ideally, if the hardware supports it. AMD's RVI and Intel's EPT). Kemari's equivalent of a lockstep vm gets only hot updates on dirtied pages, not a full memory snapshot. The alternative would of course be a rather inefficient design.
Re:Xen did it first by lightyear4 · 2009-05-22 10:25 · Score: 1

Sounds like a delay on the switch. Add a gratuitous arp using arping in whatever vif-* script you're employing for virtual machine network interfaces and that problem will disappear.
Re:Xen did it first by lightyear4 · 2009-05-22 10:28 · Score: 1

Such is the state of affairs with open source. I've been using Kemari in production for almost six months now. Some research prototypes are quite production-environment friendly.
Re:Xen did it first by lightyear4 · 2009-05-22 10:33 · Score: 1

Yep, vmotion's explicit arp wins in that regard, whereas as I suggest Xen requires tweaks in order to function optimally.
Re:Xen did it first by qnetter · 2009-05-22 11:58 · Score: 2, Informative

Marathon has had it working on Xen for quite a while.

Re:FT by Jaime2 · 2009-05-22 09:46 · Score: 4, Insightful

He didn't say to use a single vCPU for a non-SMP aware app, he said to use a single vCPU for all application loads. For SMP aware apps, adding another virtual CPU is a scaling option. If you have non-SMP aware apps, then you need to find another solution, like migrate to a host with faster cores.

It makes sense. If you have 32 workloads and 16 cores, don't add the overhead of making 64 virtual vCPUs, 32 will use the host resources more efficiently as long as one app doesn't need the power of more than one core. If it does, give it to the guest only when it needs it.

Re:FT by atomic-penguin · 2009-05-22 09:56 · Score: 1

Multiple CPUs has been the standard for servers for at least a decade in x86 gear.

Seriously, even Windows NT 4 had SMP support in 1997.

I don't know what year Linux first had support for SMP, but the 2.0 kernel supports SMP, apparently even on a 486. Just imagine a Beowulf made of 486 class SMP machines!

--
/^([Ss]ame [Bb]at (time, |channel.)){2}$/

Re:FT by asdf7890 · 2009-05-22 09:56 · Score: 5, Informative

If your apps are not SMP aware, WTF at you using? Multiple CPUs has been the standard for servers for at least a decade in x86 gear.

Yes, but it doesn't work in VMs the same way, at least not in VMWare. On a loaded system you often find a single-vCPU VM will out perform one with more than one vCPU, in fact if you can spread your app over multiple machines you are generally better off running two single CPU VMs instead of one dual-CPU one. This is true no matter how many physical CPUs/cores you have available.

Why is this? Because a single CPU VM can be scheduled when-ever there are time-slices available on any physical CPU/core (though a good hypervisor will try not bounce VMs between cores too often, as this reduces the potential gains from using the core's L1 cache and (on architectures where L2 cache isn't shared) L2 cache too), but a dual vCPU VM will have to wait until the hypervisor can give it timeslices on two CPUs at the same time. If this is the only VM that is actively doing anything on the physical machine (and the host OS is otherwise quiet too) this makes little difference aside from a small overhead on top of the normal hits for not running on bare metal, but as soon as that VM is competing with other processes for CPU resource it can have a massive negative effect on scheduling latency.

Re:FT by h4rr4r · 2009-05-22 09:57 · Score: 1

I just am shocked to hear of any real server apps that are not smp aware.

If you have any boxes with one vcpu then it can't take another cpu when it needs it. Great, now you have 16 boxes and all have one cpu, too bad that none of them can soak all 16 when the others are totally idle. What a wonderful way to waste hardware.

This is why guest priorities are useful. Give them all a number of cpus that they actually needand let the priority sort it out. You will find that is what the individual OSes are doing anyway via their schedulers. There are not going to be any hard and fast rules about it, other than if you have 1 vcpu you should have more on that box since the guest overhead is going to kill you on hardware consumption if that is your practice.

Is this a general-purpose Cloud OS? by Laxori666 · 2009-05-22 10:02 · Score: 1

I had an idea at some point of a distributed app, similar to SETI@Home, that people would run on their computer. These computers would form a cloud which would support creating VMs that could run arbitrary code. If one app is currently running your code, and the computer it's on goes down, your code would continue to run on another one. If everyone runs it, it would be a huge pool of computational power. Then you could run crazy things on it. Then, profit! Anyway, is this a step in that direction?

Re:Is this a general-purpose Cloud OS? by the_fat_kid · 2009-05-22 12:38 · Score: 1

and then if we could get these on to peoples computers with out letting them know it. Maybe with an E-mail or a web page...
I bet we could come up with a network of these robotic slave CPUs....
{insert sky-net reference here}

--
-- Sig under construction...
Re:Is this a general-purpose Cloud OS? by SanityInAnarchy · 2009-05-22 13:47 · Score: 1

This is not a terribly new idea -- it's been around ever since Sun coined the phrase "The network is the computer."
The biggest problem with it is, of course, that I don't trust you, and you don't trust me. Why should I trust your computer to run my VM?
It only works for things like SETI because the data is not private, and the results can be verified, both by having multiple nodes run the same workload (and comparing the results), and by re-running the results yourself if you see something that looks like a hit.
By the way: "Something like SETI" suggests you really should do more research... SETI has migrated to BOINC, which includes projects other than SETI, like protein folding, which might help with things like a cure for cancer. BOINC is already a "general-purpose Cloud OS", in the sense that anything is a "cloud" anything (I'm starting to hate that term as much as "Web Two Point Oh") -- the difference being, of course, that the code is coming from a trusted source, and the results are verifiable like that.

--
Don't thank God, thank a doctor!

Re:FT by jimicus · 2009-05-22 10:02 · Score: 1

Seriously, even Windows NT 4 had SMP support in 1997.

I don't know what year Linux first had support for SMP, but the 2.0 kernel supports SMP, apparently even on a 486. Just imagine a Beowulf made of 486 class SMP machines!

Making good use of multiple CPUs requires more than just OS support.

Re:FT by h4rr4r · 2009-05-22 10:05 · Score: 1

, but a dual vCPU VM will have to wait until the hypervisor can give it timeslices on two CPUs at the same time.

Then their hypervisor is broken.
It should be possible for A dual vCPU machine to have vCPU1 and vCPU2 be two timeslices on the same real cpu if need be.

How does it detect a 'failure'? by moosesocks · 2009-05-22 10:09 · Score: 4, Informative

How many hardware failures are actually characterized by a complete 100% loss of communication (as you'd get by pulling the plug)?

Don't CPU and Memory failures tend to make the computer somewhat unstable before completely bringing it down? How would vSphere handle (or even notice) that?

Even hard disk failures can take a small amount of time before the OS notices anything is awry (although you're an idiot if you care enough about redundancy to worry about this sort of thing, but don't have your disks in a RAID array)

--
-- If you try to fail and succeed, which have you done? - Uli's moose

Re:How does it detect a 'failure'? by subreality · 2009-05-22 10:42 · Score: 1

Don't CPU and Memory failures tend to make the computer somewhat unstable before completely bringing it down?
Yes, and this is one of the key dividing lines between true HA mainframes, and every software implementation of HA services. The latter are what 99% of people seeking HA want (for cost reasons), but the former has been great business for IBM and Sun.
Re:How does it detect a 'failure'? by Mista2 · 2009-05-22 22:07 · Score: 2, Interesting

Several Brand new servers with VI3 installed 2 weeks ago, left to run to burn in, first production guests moved onto them on Friday, Saturday sees CPU voltage regulator in one go pop, dead server. It would have been nice to just have the the Exchange server keep on rocking until Monday when we could replace the hardware, but no, now I've spent my Saturday morning going into work and fixing it.
However thanks to VM, the HighAvailbility service did restart the guests automatically, but I did have to repair a damaged mailstore. 8(
Re:How does it detect a 'failure'? by Thumper_SVX · 2009-05-23 09:01 · Score: 2, Interesting

This is one reason I run Exchange 2007 with a clustered PHYSICAL mailbox server, and all the CAS and HT roles I run on virtual machines. I don't run database type apps on VMware for exactly these reasons... I am a big VMware supporter, but I also specify for our big apps that we use big SQL and Exchange clusters for HA... not VMware. Yes, it's a bit more expensive that way, but our Exchange cluster now hasn't been "down" in over a year, despite the fact that each node gets patched once a month and rebooted. My users love it :)

Re:FT by h4rr4r · 2009-05-22 10:09 · Score: 1

Basically all server apps are SMP aware. Do you think IIS or Apache only have one worker process?

Perhaps that MSSQL and Postgresql only use one thread?

Don't ask, just look. by RulerOf · 2009-05-22 10:13 · Score: 4, Informative

One of the statistics measured by virtualcenter is the lag you're asking about.

The first hit on google images should give you a good idea.

In practice, I don't know... I imagine that the secondary instance will still receive network traffic bound for the cluster, so it'd probably be perceived as a hiccup when the primary one goes down, which is good enough for most services.

--
Boot Windows, Linux, and ESX over the network for free.

It works as advertized by RobiOne · 2009-05-22 10:16 · Score: 5, Informative

Like everyone else pointed out, it's a VM in lockstep with a 'shadow' VM. This is not just 'continuous VMotion'.

If something happens to the VM, the shadow VM goes live instantly (you don't notice a thing if you're doing something on the VM).

Right after that, the system starts bringing up another shadow VM on another host to regain full FT protection.

This can be network intensive, depending on the VM load, and currently only works with 1 vCPU per VM. Think 1-2 FT VMs per ESX host + shadow VMs.

You'll need recent CPUs that support FT and have an VMware HA / DRS Cluster set up.

So if you've got it, use it wisely. It's very cool.

--
-- Robi

Re:It works as advertized by jo42 · 2009-05-22 16:11 · Score: 1

So, if the software running in the primary VM has a problem causing it to go down pretty hard (think near or BSOD class), and the lockstep mechanism is keeping things synchronized really well to the shadow VM(s), how many microseconds after the shadow VM comes up, does it go the way of the primary VM, as in totally tits up?
Or is the definition of FT (fault tolerance) "when some marketing droid pulls out a network cable or shuts down a server during a demonstration while trying to sell tens of thousands of dollars worth of [pretty useless] software"?
Re:It works as advertized by RobiOne · 2009-05-22 17:07 · Score: 1

Please familiarize yourself with the difference between hardware fault tolerance and software fault tolerance.

--
-- Robi
Re:It works as advertized by TheLink · 2009-05-22 23:39 · Score: 1

If the primary VM BSODs due to a software problem, the odds are the shadow VM would too.
--
- Too many replies beneath your current threshold
Re:It works as advertized by jo42 · 2009-05-23 00:39 · Score: 1

So, if the hardware is failing, corrupting memory or data on external storage, and the lockstep mechanism is keeping things synchronized really well to the shadow VM(s), how many microseconds after the shadow VM comes up, does it go the way of the primary VM, as in totally tits up?

Re:FT by asdf7890 · 2009-05-22 10:24 · Score: 2, Informative

, but a dual vCPU VM will have to wait until the hypervisor can give it timeslices on two CPUs at the same time.

Then their hypervisor is broken. It should be possible for A dual vCPU machine to have vCPU1 and vCPU2 be two timeslices on the same real cpu if need be.

Which would kill any benefit of running SMP in the VM anyway, if it were possible.

My understanding, which may be out of date, is that this is not considered a good idea as timing issues between threads on the two vCPUs if scheduled one after the other on the same core could potentially cause race conditions. And if not that serious, the threads on the vCPU that gets the first slice of the real core could be paused waiting for locks to be released by threads the guest OS has lined up to runon the other vCPU. This is an explanation that I have seen given as to why VMWare would not allow you to do virtual SMP on a single-core-single-CPU host machine (i.e. emulating SMP in the guest by giving two or more vCPUs alternating time on the only physical core).

Small correction... by Cramer · 2009-05-22 10:33 · Score: 1

The article mentions an inability (for the "pre-released" version) to PXE boot. If he's talking about booting for installation, then he's 100% wrong. The ESX beta/RC (build 140815) will, indeed, boot and install over a network. It's different from 3.5 so you'll have to adjust your commandline and/or kickstart. They use "weasel" instead of "anaconda" and that runs inside the service console. Short answer... "method=" becomes "url=" -- with a properly formated URL, eg. url=nfs://server/path/to/esx/. It's a much larger boot enviroment -- 80MB -- so it takes longer to boot, and from my half dozen installations (I'm only testing on 2 machines), it takes substantially longer to install 4.0 than 3.5. (my 3.5 installs take 2.5mins.)

Re:Small correction... by RulerOf · 2009-05-23 07:27 · Score: 1

The ESX beta/RC (build 140815) will, indeed, boot and install over a network
Has anyone done a PXE boot of the ESX OS itself yet, though?

AFAIK, the only "diskless" ESX deployments rely on flash storage.

--
Boot Windows, Linux, and ESX over the network for free.
Re:Small correction... by Thumper_SVX · 2009-05-23 09:04 · Score: 1

Yes. It's quite trivial, actually and I seem to recall there's a VMware whitepaper on it.
OK, it's ESXi, not ESX... but the difference is small enough to make no odds. Oh, and on the flip side of that, I do find it easier to have ESXi on an internal flash in case my PXE server is down. I would host it virtually and on HA if it weren't for the fact that I have that whole "chicken and egg" problem :D
Re:Small correction... by RulerOf · 2009-05-23 09:39 · Score: 1

Chicken and egg indeed! On that note, perhaps you would put the PXE service on your SAN, no?

It's of sincere interest to me because we're turning some whitebox servers whose raid controllers aren't on the ESX HCL into hosts. I read that VMWare is moving ESXi as their premeir hypervisor to replace ESX, so this kind of setup would be interesting to explore, though i imagine that a flash based local datastore would be more... robust.

--
Boot Windows, Linux, and ESX over the network for free.
Re:Small correction... by Cramer · 2009-05-23 10:24 · Score: 1

ESX isn't designed to be run "diskless". It has to have somewhere to put it's VMFS -- which in 4.0 also contains swap and a few other things.
(That doesn't mean one cannot bend it into a shape that will run diskless.)
Re:Small correction... by RulerOf · 2009-05-23 10:37 · Score: 1

It has to have somewhere to put it's VMFS
Well, not in the sense of having no datastore, i simply mean without rotating hard disks present in the server. Flash storage accomplishes that, but you could use gPXE to connect it to an iSCSI target and remotely access its VMFS datastore from there.... if it were possible to do so with ESX, of course. You might not want to swap to it, and it'd really demand another NIC and so on, but that's what I really meant.

--
Boot Windows, Linux, and ESX over the network for free.
Re:Small correction... by Thumper_SVX · 2009-05-24 06:09 · Score: 1

You know, that's an excellent idea if your SAN is capable... not all are. Our current production SAN is an HP EVA 4200... if we had a 4400 we could do it, but with the older 4200 it doesn't even have a direct network connection of its own; instead it has a dedicated storage server for managing the SAN (actually a DL380 running Windows Storage Server).
The ESXi HCL is a lot tighter than ESX, but I've had few problems so long as I stick with the "common" solutions. I buy almost exclusively HP servers for virtualization since (a) they're damned nice systems for a decent price and (b) they ship a customised ESXi CD that has all of the hardware drivers and agents for their entire line. The fact that their hardware drivers often support a VERY long list of legacy hardware in a single driver really helps, too. Dell has similar solutions, but I've had difficulty with two identically named controllers requiring completely different drivers because of different chipsets. Really annoying.
Another nice thing with the flash based boot is that a USB key is cheap... and the HP servers have internal USB slots just for that purpose.
My general feeling on virtualization though is that when you're rolling out a VM solution, then you're far better served buying hardware from a big name... that's on the HCL. The reason for that is simply that it's far more "supportable", and if something goes wrong you can really just go at most two places to get that support; the hardware vendor and VMware. Yes, it costs a little more, but I'd much rather spend the extra grand or so to have a box that I can make a phonecall and have a piece of hardware replaced for three years (standard warranty on HP servers) without fuss or hassle. It means I can focus on providing my solutions to my end users instead of fighting hardware issues. YMMV... but it makes me feel better :)
Re:Small correction... by RulerOf · 2009-05-24 07:06 · Score: 1

Accidental AC... That's a first for me.

--
Boot Windows, Linux, and ESX over the network for free.
Re:Small correction... by Thumper_SVX · 2009-05-25 02:27 · Score: 1

LOL... it happens to us all eventually ;) Next thing you know it'll be adult diapers and yelling at "those darned kids" :)
Seriously... this should help; http://www.chriswolf.com/?p=182
And I wish you all the best with your solution. Yes, I agree HP is often annoying, but their support for our solution has been great and easy to work with. Dell, well a few years ago I wouldn't touch a Dell server with a ten foot pole but I've had better luck recently with our European offices that insist on Dell (at least until our Corporate folks decide to come down on them... it's coming...). Dell have really upped their game in the last 18 months and are really a contender now in the server space. However, I still find HP to be better because a single image with the latest drivers will often boot effectively unmodified even on 3-4 year old hardware. That's quite handy in an environment where our systems are bought with 5 year warranties and are kept for every minute of those five years! :)

Re:FT by OrangeTide · 2009-05-22 13:04 · Score: 2, Insightful

Try supporting synchronization of two virtual machines over a network/SAN when you also have to deal with SMP. Gets hard.

--
“Common sense is not so common.” — Voltaire

Re:FT by Jaime2 · 2009-05-22 14:44 · Score: 3, Insightful

Yea, but you also get the overhead of two schedulers (the host and the guest) and two systems moving thread context from core to core, which is an expensive operation. Most VMware systems are pretty heavily oversubsubscribed in terms of cores. Its not uncommon to have 60 guests on a 16 core host. If all 60 guests have 4 virtual CPUs, you do get the advantage that one guest can expand out to consume about a quarter of the total host CPU power, but you also have the cost of guest SMP switching even though you are using an average of 0.25 cores per guest.

Re:FT by jimicus · 2009-05-22 20:01 · Score: 1

It wasn't them that I was thinking of specifically.

There are plenty of applications of databases (particularly MIS-type things) which don't really lend themselves to multithreading that well in the first place.

(BTW, Postgres only uses one thread per query and changing that, the last time I checked, was part of some significant work which is only just taking place right now)

Re:FT by Mista2 · 2009-05-22 22:00 · Score: 1

SQL server is a great example. Most of the time on dedicated hardware you might have one SQL server with several Databases and maybe even SQL instances. These will happly share the multiple CPUs torun seperate apps concurrently, but most queries in SQL execute in a single thread.

With VMs, as you dont have to pay $$$ for more hardware, only software licences, we have seen more customers simply provision another single CPU SQL server in VM for general duty work. These would generally have no tuning in the applications or queries for multithreading, so they work fine in single CPU. Disk IO tends to be the biggest bottleneck.

OS Agnostic? by Phoghat · 2009-05-23 06:23 · Score: 1

So what they're saying is that they don't believe in operating systems, but acknowledge that they might exist?

--
Think of how stupid the average person is, and realize half of them are stupider than that.

Re:FT by Thumper_SVX · 2009-05-23 08:56 · Score: 1

And yet most processing workloads in apps I work with are single-threaded and not terribly scalable. The reality is that >1 CPU is exponentially more difficult to code for (at least so the excuses I hear from developers goes).

However, if your application can sit behind a load balancer and runs a single thread, why NOT have n*single CPU servers? Of course, that sort of takes away from the instant failover advantage. Ho hum :P

Re:FT by Thumper_SVX · 2009-05-23 08:57 · Score: 1

You obviously don't deal with many vertical apps. Most of them are not SMP aware, or tend to sit there hogging a single CPU/Core pretty much all the time. At least, this is true in the Windows world... UNIX is quite different, but there's a dearth of the applications my users want to run.

No, just 'cos I do it for a living doesn't mean I have to like it :)

FT CPU requirements by dyao · 2009-06-01 11:14 · Score: 1

I'm on the FT team at VMware and just wanted to provide some additional information on FT requirements. You can also find out more about FT at: http://www.vmware.com/products/fault-tolerance/

VMware collaborated with AMD and Intel in providing an efficient VMware Fault Tolerance (FT) capability on modern x86 processors. The collaboration required changes in both the performance counter architecture and virtualization hardware assists from processor vendors. These changes could only be included in recent processors from both vendors: 3rd-Generation AMD Opteron(tm) based on the AMD Barcelona, Budapest and Shanghai processor families; and Intel® Xeon® processors based on the Penryn and Nehalem microarchitectures and their successors.

The current set of VMware FT supported processors are:
Intel® Xeon® 3100 Series, Wolfdale (UP)
Intel® Xeon® 3300 Series, Yorkfield
Intel® Xeon® 5200 Series, Wolfdale (DP)
Intel® Xeon® 5400 Series, Harpertown
Intel® Xeon® 7400 Series, Dunnington
Intel® Xeon® 5500 Series, Nehalem
AMD Opteron(tm) 1300 Series, Budapest
AMD Opteron(tm) 2300 Series, Barcelona (65nm, DP) and Shanghai (45nm, DP)
AMD Opteron(tm) 8300 Series, Barcelona (65nm, MP) and Shanghai (45nm, MP)

VMware maintains a KnowledgeBase (KB) article that provides a current list of supported processors, see http://kb.vmware.com/kb/1008027.

You can download a utility, VMware SiteSurvey at http://www.vmware.com/download/shared_utilities.html to check if your configuration can run VMware FT.

While we understand the end user's desire to be able to use VMware FT on as many processors as possible, VMware's goal is to guarantee that Fault Tolerance works reliably.

Slashdot Mirror

First Look At VMware's vSphere "Cloud OS"

62 of 86 comments (clear)