Red Hat & AMD Demo Live VM Migration Across CPU Vendors

Bravo! by Cornwallis · 2008-11-07 04:16 · Score: 4, Funny

I love to see things like this that give me a greater freedom to migrate off the major players.

Re:Bravo! by greenhuey · 2008-11-07 04:24 · Score: 0

Give me liberty or give me death!

--
I added the word "nerds" to http://wordandlink.com/, you should add a word too.
Re:Bravo! by 2names · 2008-11-07 04:45 · Score: 3, Informative

We have certainly come a long way when a Cornwallis supports freedom of the people. :)

--
"I'm just here to regulate funkiness."
Re:Bravo! by harry666t · 2008-11-07 04:59 · Score: 0

OK, sure

<aims a gun at greenhuey's head>

You know, it certainly doesn't bother me that I don't have the source code for this gun.
Re:Bravo! by Anonymous Coward · 2008-11-07 08:37 · Score: 1, Informative

With Obama at the helm, you may not have guns to protect your liberty, so death is more likely. ;)

This is still unreleased test demo's by Beached · 2008-11-07 04:21 · Score: 4, Insightful

The real beauty of this will come when the system automatically moves VMs to machines in case of hardware problems or when a system is underutilized. It would let you power down servers during non-peak times and save oodles of cash.

--
---- aut viam inveniam aut faciam

Re:This is still unreleased test demo's by Hercynium · 2008-11-07 04:31 · Score: 3, Insightful

Well, that kinda *is* the purpose of live VM migration... it's already being done, just not between systems with different processor types.

--
I'm done with sigs. Sigs are lame.
Re:This is still unreleased test demo's by voidptr · 2008-11-07 05:00 · Score: 3, Interesting

This is like blowing the engine in a Ford and electing to put a Chevy engine in to replace it.
While still driving down the highway at 60 mph.

--
This .sig for unofficial government use only. Official use subject to $500 fine.
Re:This is still unreleased test demo's by Comatose51 · 2008-11-07 05:11 · Score: 4, Interesting

You mean like VMware's VMotion, HA, and DRS functionalities?

--
EvilCON - Made Famous by /.
Re:This is still unreleased test demo's by TheRaven64 · 2008-11-07 05:57 · Score: 4, Insightful

They don't seem to have released many details of this. Migrating between x86-with-SSE and x86-without-SSE, for example, is pretty simple - you just need the OS or hypervisor to trap the illegal instruction exception and emulate. Migrating from x86 to x86-64 is pretty easy too - you just don't get any advantages from the 64-bit chip. Going the other way is really hard, and would need the hypervisor to trap the enter-64-bit-mode instruction and emulate everything until the mode was exited (difficult, slow, and probably pointless).
I read TFA when it first came out and couldn't work out exactly what they were claiming was novel. Migrating between very-slightly-different flavours of x86 is not really that hard. Migrating between ARM and x86 would be incredibly hard - Xen can actually do this with the P2E work (not sure if it ever made it in to trunk), which migrated a VM from real hardware in to QEMU but, again, that's not an ideal solution unless the emulator has traps that userspace can use - for example a Java VM might get a signal after migration, flush its code caches, and re-JIT as x86 code instead of ARM.

--
I am TheRaven on Soylent News
Re:This is still unreleased test demo's by JEB_eWEEK · 2008-11-07 06:05 · Score: 3, Insightful

Yes, except without requiring identical hardware.
Re:This is still unreleased test demo's by drachenstern · 2008-11-07 06:11 · Score: 1

Er yeah, but by a proprietary vendoooorrrr, eh... I see what you did there ;)
I think the goal is to eventually open-source the concepts, and sell the wrappers. And the support, always sell the support...
I have to say tho, that I thought the whole point of CPU ISA was to be able to do just this sort of thing. If you're not writing code that absolutely depends on the underlying CPU hardware (why would you, isn't that the point of the kernel) then you should be able to move to any other platform... Okay okay, so there's the whole 32-bit -> 64-bit snafu, but that's because we're talking paradigm shifts.
What I'm curious about is the Xeon -> Itanium2 shift... And naturally the reverse as well =D

--
2^3 * 31 * 647
Re:This is still unreleased test demo's by AJWM · 2008-11-07 06:35 · Score: 1

Since Itanium2 will run x86 code sort of natively, going Xeon->Itanium shouldn't be that hard. Migrating a VM that's running IA-64 code to a Xeon could be a little tougher.

--
-- Alastair
Re:This is still unreleased test demo's by nabsltd · 2008-11-07 07:11 · Score: 3, Informative

VMware doesn't require "identical" hardware to do live migration, either.
It does have to be similar enough, which at this point pretty much means just the same processor manufacturer. As long as the processor supports the hardware virtualization, then VMware will allow you to set up a cluster that will allow live migration with no issues.
Re:This is still unreleased test demo's by sirsnork · 2008-11-07 07:17 · Score: 2, Interesting

Between different vendors is actually quite hard. Live migration requires saving the CPU state exactly, including all registers. Going to a different vendors CPU means all this saved state may not match up and then you have to do something so the VM won't just crash. This is actually becoming _harder_ as more and more virtualization technology is being put into the CPU silicon (Intel VT, AMD-V etc). Each new series has a few more features to make virtualization simpler, and you have to deal with making sure what was available to the VM on one CPU is identical to whats available on the new CPU without destroying performance (which is what will happen if you start emulating).
In saying that, VMWare are very very VERY careful with the tech they introduce, to give you an example round robin network teaming is still "experimental". I'm fairly sure they have played with this internally already and not done it either because it would make support harder or because of the changing CPU landscape with regard to the integrated virtualization features on new CPU's they would need to release a new version for each new CPU release for this to continue working.
Make no mistake, this is big news for KVM and well done to them, but if they can make it work reliably so can anyone else, and that includes VMWare

--

Normal people worry me!
Re:This is still unreleased test demo's by virtualboy · 2008-11-07 07:34 · Score: 1

Save your self some money and check out Virtual Iron. It does not require identical hardware.
Re:This is still unreleased test demo's by ampman · 2008-11-07 10:16 · Score: 1

As posted before, it seems to have been done along time ago see: http://www.byte.com/art/9407/sec6/art1.htm
Re:This is still unreleased test demo's by Chris+Snook · 2008-11-07 10:27 · Score: 1

The really cool thing is the hardware support for masking CPUID calls from guests, so you don't have to emulate them in the hypervisor, which the VMware people in this thread have pointed out adds measurable overhead on some workloads. This lets you present a generic x86_64 CPU to the guest, which will run most non-HPC enterprise apps just fine. SSE2 is a mandatory part of the x86_64 instruction set, so all x86_64 processors will be able to get decently optimized math that can be live-migrated between different sub-architectures and processor revisions. You incur a slight performance hit on the older hosts, but this feature makes it easier to migrate away from them, which makes migrating to the new AMD processors much more attractive to people with large virt farms.

--
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Re:This is still unreleased test demo's by cheater512 · 2008-11-07 13:19 · Score: 1

Not AMD to Intel and vice versa, thats the real breakthough.
Re:This is still unreleased test demo's by Anonymous Coward · 2008-11-07 15:56 · Score: 0

Itanium2 won't run x86 code natively, or "sort of natively" as you put it, at least on those chips made after 2006. And given the cost of Itanium2 systems it's not economically viable to do what you suggest here.
And, you'll never get IA64 native code running effectively in an emulator on a Xeon. Attempting to shove VLIW code bundles through a heavily pipelined scalar chip would, in all likelihood, get you the absolute top job in the world as a compiler designer if you pulled it off. ;)
Re:This is still unreleased test demo's by Anonymous Coward · 2008-11-07 19:49 · Score: 0

Itanium 2 does the x86 emulation in software so it would be pretty pointless to migrate a x86 VM to an I2 server, more so when you consider the price difference between the 2 platforms.
Re:This is still unreleased test demo's by JamesTRexx · 2008-11-07 22:01 · Score: 1

just the same processor manufacturer

And even that's not true. We had to buy an older server type because even if we bought a newer server with -in this case- AMD cpus it wouldn't mix with the others.
The CPU has to be functionally the same, otherwise you'll have to resort to cpu masking.

--
home
Re:This is still unreleased test demo's by nabsltd · 2008-11-08 04:31 · Score: 1

Google for "Enhanced VMotion Compatibility" and you'll see that you don't have to do any manual work to allow every processor that supports hardware virtualization to participate in your cluster.
The only caveats are that you can't have any VMs running when you enable this feature. This is not a big deal if you enable it when you first create the cluster, and only a problem if you bring a new host with running VMs into the cluster. But, a small amount of planning deals with that, too.
Re:This is still unreleased test demo's by LarsG · 2008-11-08 08:45 · Score: 1

This is actually becoming _harder_ as more and more virtualization technology is being put into the CPU silicon (Intel VT, AMD-V etc). Each new series has a few more features to make virtualization simpler, and you have to deal with making sure what was available to the VM on one CPU is identical to whats available on the new
I must admit that I'm not quite up to date on the details, but isn't the VT/AMD-V changes only visible to the hypervisor (ring -1)? It might make moving VM state harder (the hypervisor has to handle migrating AMD-V state to the equivalent VT state), but it should be invisible to the guest OS running inside the VM.
I would suspect that Guest OS visible changes (ring 0-3) would be harder to handle (like migrating from a CPU that has SSE2 to one that doesn't). The hypervisor would either have to trap and emulate the missing instructions, or trap cpuid and tell the guest OS that the CPU only supports a least common denominator set of extensions.

--
If J.K.R wrote Windows: Puteulanus fenestra mortalis!

Umm... by frodo+from+middle+ea · 2008-11-07 04:28 · Score: 2, Interesting

All the interesting controversy aside, cross vendor migration is [obviously] a good thing for customers because it avoids platform lock-in Well almost all VM products barring VirtualPC do indeed supoort running the same VM image on across various vendor platforms, in fact that is the whole point of a VM , isn't it ?

The fact to highlight is that the migration was done of a live VM without disrupting the VM's operations.

--
for the last time people, I am "frodo from middle eaRTH", not "middle eaST".

Re:Umm... by MBGMorden · 2008-11-07 04:59 · Score: 1

It's not a matter of it RUNNING on multiple platforms. The issue here is live migration. Moving a running VM from one machine to another without skipping a beat. On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain
Re:Umm... by TheRaven64 · 2008-11-07 06:01 · Score: 2, Informative

On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly

Do you? I first saw Xen demo live migration in 2005, and I don't think it was new then. Their demo had a Quake server being thrown around a cluster without clients noticing. Downtime was well under 100ms. You can read the paper for more information.
They were claiming that you can move between processor types, but they didn't specify how much different they could be. If it's just a matter of SSE or 3DNow! support disappearing then that's not a hard problem - just trap-and-emulate any of the old instructions. Relaunching programs that use these will cause the new values of CPUID to be picked up.

--
I am TheRaven on Soylent News
Re:Umm... by MBGMorden · 2008-11-07 06:36 · Score: 1

VMotion has been around for quite a while. The specialty here is between different processor types, and it's apparently not as trivial as you state. For one, there are different extensions and such between various processor types. Sure everything can be compiled for i386 and run on anything, but we're talking about arbitrary code that can be running on these VM's. There's a whole lot that can be different beyond their commonality, and if you resort to trapping and emulating all those instructions then you end up with as much as an emulator as a hypervisor at that point, and you've basically defeated the purpose.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain
Re:Umm... by nabsltd · 2008-11-07 07:17 · Score: 2, Informative

And, when you think about it, any instruction that you would have to trap if the VM used to be running on a different processor must be trapped at all times.
This is because you have no way of knowing which processor type the VM was first started on. When this happened, it's likely the OS did some hardware checking and figured out which instructions it could (and could not) use. Moving the VM isn't going to change what the OS believes is the processor, and that's the problem.
Overall, VMware's Enhanced VMotion Compatibility method of lying to the OS about the capablilities of the processor seems to be the easist way of doing this. But, they only do it within one CPU manufacturer, because otherwise you'd end up with a very low-featured virtual processor.
Re:Umm... by online-shopper · 2008-11-07 09:24 · Score: 1

Isn't this like what transmeta did, except in software?
Re:Umm... by ultranova · 2008-11-07 20:08 · Score: 1

This is because you have no way of knowing which processor type the VM was first started on.

Why ? Is there any particular reason you can't send this information along with the VM itself ?

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Umm... by nabsltd · 2008-11-08 04:26 · Score: 1

Not usefully, and even so, you're still likely to take needless performance hits.
If the VM started running on a Xeon Harpertown and moved to an AMD Santa Rosa, then to a Core 2 Quad Yorkfield, which features should be enabled/disabled in each move?
Re:Umm... by ultranova · 2008-11-08 07:02 · Score: 1

If the VM started running on a Xeon Harpertown and moved to an AMD Santa Rosa, then to a Core 2 Quad Yorkfield, which features should be enabled/disabled in each move?

After each move, compare the current hosts featureset with the original hosts featureset, and emulate all that are missing.
Your original claim was: "This is because you have no way of knowing which processor type the VM was first started on." That is clearly untrue, since you can pass the CPUID information of the original processor right along with the VM, and thus let the receiver know what CPU the OS thinks it is running on, thus negating your conclusion: "And, when you think about it, any instruction that you would have to trap if the VM used to be running on a different processor must be trapped at all times."
Of course you take a performance hit from emulation, and would be better off to pass VM's to as compatible - preferably identical - CPU's as possible; however, sometimes the cost of the overhead of emulation might be less than the cost of rebooting the VM, or keeping an obsolete type of machine working, and in that case it would be nice to be able to migrate the VM to another type of processor. Passing the CPU type information along with it lets the receiver know what special instructions - if any - need to be emulated, thus only incurring the overhead when it is actually necessary.
This, of course, all depends on whether you meant what I think you meant.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Umm... by LarsG · 2008-11-08 08:57 · Score: 1

Relaunching programs that use these will cause the new values of CPUID to be picked up.
I suspect one could end up with devil in details problems if the guest OS suddenly saw different CPUID values. While it might work fine, the expectation has always been that CPUID won't change after boot-up so you could end up with all sorts of snafus.
What you could do is have the hypervisor trap CPUID and report a least common dominator set of capabilities for the CPUs in the cluster. Or have CPUID report more capabilities than the weakest/oldest CPUs but have the hypervisor trap and emulate those instructions when running on those CPUs.

--
If J.K.R wrote Windows: Puteulanus fenestra mortalis!
Re:Umm... by nabsltd · 2008-11-08 15:58 · Score: 1

Passing the CPUID information will give you the identity of the original processor, but not the actual capabilities. You could also pass the whole set of extended CPUID information to know which flavors of SSE are (or aren't) supported, etc., but that still isn't the true capabilities and details of the CPU.
The reason you can't emulate efficiently is because the matrix of "VM CPU" to "physical CPU" would be large...very large. And, it would include things that would require the hypervisor to trap an instruction because even though the "physical CPU" and the "VM CPU" both support the instruction, they do so in a slightly different way, with different side effects.
The masking technique that VMware uses doesn't require the VM kernel to know anything about exactly what instructions are or are not supported. The CPUID the VM sees doesn't show capabilities that the actual processor might have, so the OS doesn't try to use those instructions (and, if it does, they don't work and weird things happen, just like on a physical machine).
Heck, there are some programs right now that run fine in a VM, but if you live migrate them to another system that has only a slightly different CPU (speed, cache size, etc.), the program either performs very poorly or does strange things. Usually, these are programs that aren't very portable anyway (using delays based on calculated CPU speed, picking algorithms based on cache size, etc.), but that's an example of how things can go to hell even with a small change in the live migration.

however, sometimes the cost of the overhead of emulation might be less than the cost of rebooting the VM, or keeping an obsolete type of machine working, and in that case it would be nice to be able to migrate the VM to another type of processor.
Really, this just isn't a scenario that would happen. VMs are commodities that can run generally anywhere. Keeping a truly obsolete host running is always going to be far more expensive than just getting some more modern hardware to replace it. Also, any service that needs to be running 24/7/365 isn't going to be a single machine, it's going to be on something truly high availability (a clustered solution, etc.).
You'd stop the VMs on the obsolete server and then restart them on a server that is compatible with modern hardware virtualization. It would quite literally take a few minutes if the VM disk is on a shared resource (which is required for live migrations, anyway). Then, you would have the VM running in a modern cluster of all the same brand of CPU.
As a side note, one of the huge advantages of a VM is the ability to reboot it very, very quickly. Physical servers often have to do all sorts of hardware checks that are not done by the VM. Plus, virtual hardware on the VM doesn't need any artificial delays that physical hardware might need. So, basically, it's only the OS shutdown and startup you have to deal with, and that's generally not too bad.

Xen 3.3 supports this already by stabe · 2008-11-07 04:29 · Score: 3, Informative

Xen supports this feature since Xen 3.3, it is called CPUID: http://www.nabble.com/Xen-3.3-News:-3.3.0-release-available!-td19106008.html No real breakthrough here...

Re:Xen 3.3 supports this already by Vendetta · 2008-11-07 04:40 · Score: 1

Xen supports this feature since Xen 3.3, it is called CPUID: http://www.nabble.com/Xen-3.3-News:-3.3.0-release-available!-td19106008.html No real breakthrough here...
Looks to me like Xen supports migration between different CPU models, not entirely different CPU manufacturers. So yes, there is a breakthrough here.
Re:Xen 3.3 supports this already by stabe · 2008-11-07 04:59 · Score: 2, Informative

Yes, it does: http://lists.xensource.com/archives/html/xen-devel/2008-06/msg00430.html

Still x86 only by boner · 2008-11-07 04:33 · Score: 3, Insightful

Real magic would have been demonstrating a move between ANY processor architecture - Power, SPARC, x86_64 etc..

Between x86 processors is nice, but not unexpected.

Re:Still x86 only by Hercynium · 2008-11-07 04:37 · Score: 1

No problem! Just run x86 linux under qemu on all physical platforms, then run your applications under x86 linux inside a kvm inside qemu with migration between the qemu instances on each physical system!

--
I'm done with sigs. Sigs are lame.
Re:Still x86 only by corsec67 · 2008-11-07 04:39 · Score: 1

That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?
Seems like this would work between processors with a very similar ISA.
If they could run stuff compiled for one processor on another processor with a different ISA at near full speed,... that would change more than just virtualization. Run Wine on a PowerPC, emulate old consoles easily on a Pandora, etc..

--
If I have nothing to hide, don't search me
Re:Still x86 only by NormalVisual · 2008-11-07 05:51 · Score: 1

That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?

Most definitely. At that point, you're emulating, not virtualizing.

--
Please stand clear of the doors, por favor mantenganse alejado de las puertas
Re:Still x86 only by TheRaven64 · 2008-11-07 06:04 · Score: 2, Interesting

Depends. Modern emulators can run at around 50% of the host platform speed. If your guest is paravirtualised then all of the privileged instructions will be run in the hypervisor. If you're running a JIT in the guest then you can poke it to flush its code caches and start emitting native code for the new architecture, but even if you aren't then migrating the VM from the 200MHz ARM chip in your cell phone to the quad-core 4GHz x86 chip connected to your TV might be interesting.

--
I am TheRaven on Soylent News
Re:Still x86 only by Atti+K. · 2008-11-07 07:22 · Score: 1

Putting aside the huge performance penalty, I wonder if qemu can emulate the cpu virtualization support needed for kvm...
Ok, yes, I know, whooosh ;)

--
.sig: No such file or directory
Re:Still x86 only by TheLink · 2008-11-07 07:32 · Score: 1

That's doable with emulation but you will take a performance hit. I don't think there's a good way to do it without a lot of emulation.

I don't see a practical reason for cross platform "live" moves.

Switching within a platform class is likely to be far far more useful.

With cross architecture switching, it's going to be a lot harder to use the strengths of the CPUs.

Say you're on x86 and using SSE, then you switch to SPARC, what are you going to do then?

Or you're on UltraSPARC T2 and using the eight encryption engines, then you switch to x86, what do you do then?
--
- Too many replies beneath your current threshold
Re:Still x86 only by Anonymous Coward · 2008-11-07 08:58 · Score: 0

Try http://www.byte.com/art/9407/sec6/art1.htm but no one was interested then, why now?
Re:Still x86 only by cduffy · 2008-11-07 12:41 · Score: 1

Actually, qemu recently merged in live migration support -- which kvm then adopted in place of its homegrown solution -- so not nearly as much nesting as you suggest is actually needed.
Re:Still x86 only by Hercynium · 2008-11-07 16:48 · Score: 1

but then the joke isn't funny! :)

--
I'm done with sigs. Sigs are lame.

vm migration security? by Anonymous Coward · 2008-11-07 04:33 · Score: 0

but is it secure yet?

This was in all likelyhood faked. by Anonymous Coward · 2008-11-07 04:39 · Score: 5, Funny

Open source is for morons.

Only Apple has the engineering know-how and skills to pull of something like this. The fact that they have not done so to date is a clear indication that it is impossible.

check the graphs... by alta · 2008-11-07 04:41 · Score: 5, Interesting

Go to 4:05 in the video. On the far left, you can see from the blue intel line that the guest is running there, then they migrate, and the blue line goes to the idle point, and the orange line starts taking the load. But NOTICE, the AMD line is consistantly higher than the intel line was. I'm no intel fanboy... or AMD. I have both intel and amd servers in my racks. I just thought it was interesting, and I'm surprised they let the video go out like that.

--
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.

Re:check the graphs... by Loibisch · 2008-11-07 05:15 · Score: 1

Hehe, I checked the same thing. :)
To be fair, the performance of playing a HD video is pretty much determined by your graphics card. It's not really the best CPU benchmark you could imagine. :)
Re:check the graphs... by nschubach · 2008-11-07 05:17 · Score: 1

I can't watch the video right now, so I'm assuming the graph is processor utilization?
Could it possibly be because the AMD processor is running some kind of instruction translation, communication layer, or something like that?

--
Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
Re:check the graphs... by Anonymous Coward · 2008-11-07 05:30 · Score: 1, Interesting

(1) It didn't seem clear to me how many VM's each box was running. Could very well be that the Shanghai box was already doing quite a bit before the migration.
(2) There's a reason Shanghai isn't available yet.
(3) There's a reason this live migration stuff isn't available yet. Could very well be that the migration (at the moment) causes additional overhead.
I'm not trying to justify AMD here per se. It's just there's no where near enough information to make any real conclusions what so ever. This may not say anything bad about AMD which AMD would have wanted to cover up.
Re:check the graphs... by michrech · 2008-11-07 05:48 · Score: 2, Informative

It didn't seem that interesting to me. If you watch the video, the Intel and Barcelona machines showed no VM's running (0% load). When the Shanghai server took over the load, *of course* it's load line will rise -- it's the only server running a VM at that point!
There are no shenanigans going on here, and I don't think this says anything about the chips as you imply, either.

--
bork bork bork!
Re:check the graphs... by Luke_22 · 2008-11-07 05:55 · Score: 1

Go to 4:05 in the video. On the far left, you can see from the blue intel line that the guest is running there, then they migrate, and the blue line goes to the idle point, and the orange line starts taking the load. But NOTICE, the AMD line is consistantly higher than the intel line was.
look better: when the switch happens, one load eliminates the other, they're equal
then amd load keeps increasing a little bit, even after the switch is complete.
I guess it could just be the s.o. doing something else. it was a windows after all ;)

--
"I was gratified to be able to answer promptly, and I did. I said I didn't know." -- Mark Twain
Re:check the graphs... by Ecuador · 2008-11-07 05:56 · Score: 1

Well, duh, thet can run their Core 2 @ 4.5GHz on stock air cooling, silly!
Shanghai can still be faster clock for clock as they promised ;)
Seriously now, a CPU % utilization of a VM running WMP is no indication of anything.

--
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
Re:check the graphs... by wanderingknight · 2008-11-07 06:50 · Score: 2, Interesting

GPUs have nothing to do with video decoding, it's handled 100% by the CPU. At least until we get a software that can reliably take advantage of the relatively recent introduction of h264 decoding on some high-end GPUs.
Re:check the graphs... by Anonymous Coward · 2008-11-07 06:55 · Score: 0

Doesn't mean much. Maybe the Intel was more powerful, or maybe the VM needs more cpu when "booting" the migrated OS.
Re:check the graphs... by Loibisch · 2008-11-07 07:22 · Score: 1

Sure they don't.
Hardware H264 encoding is available and working.
Also go mess around with different video displaying options (overlay, x11...or for windows the various VMR revisions) and watch the CPU load go up and down.
It's not _all_ the CPU, so it'S bullshit as a CPU benchmark, especially on guaranteed-to-be-different systems.
Re:check the graphs... by Anonymous Coward · 2008-11-07 11:41 · Score: 1, Informative

True, it is higher but the guy mentions each server is running several VMs (each of which could be doing stuff), not just the one. Also the scale of time isn't visible from the start of migration until finish. Not sure it shows anything really but well spotted.
Re:check the graphs... by Anonymous Coward · 2008-11-07 12:43 · Score: 0

Bzzt, wrong. High-end GPUs nowadays have H.264 offloading; it all depends on if the video driver and operating system have the necessary tie-ins to support such offloading. nVidia has PureVideo, and AMD/ATI has Avivo.
Re:check the graphs... by alta · 2008-11-07 12:57 · Score: 1

What I'm saying is that theh load is higher on the shanghi machine with a VM than it was on the intel with a VM.

--
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
Re:check the graphs... by wanderingknight · 2008-11-07 15:30 · Score: 1

So... if you do the benchmarks on PCs that don't have a GPU that can decode H264, is it still a bullshit benchmark?
Re:check the graphs... by Anonymous Coward · 2008-11-07 20:10 · Score: 0

But if you look real closely, the shadow is straight-on, instead of off-to-the-side. And the line running behind the other servers disappears behind this one.
Re:check the graphs... by Anonymous Coward · 2008-11-08 09:07 · Score: 0

Hardware H264 encoding is available and working.
I assume you mean decoding? I've been waiting for a long time for accelerated h.264 decoding under Linux. Hardware supports it, but as far as software goes I have not found anything yet. So, URL please.

what about endianness? by Anonymous Coward · 2008-11-07 04:45 · Score: 0

can this, theoretically, be done with a mixture of big-endian and little-endian architectures?

Re:what about endianness? by xouumalperxe · 2008-11-07 05:02 · Score: 1

This was done between different vendors, not altogether different architectures. That would demand emulation beneath the virtualization, on at least one machine -- not likely to happen any time soon.

Pfff... by Turiacus · 2008-11-07 04:46 · Score: 0, Redundant

This is completely trivial. You simply have to mark the VM with the architecture of its code. Then each host contains both a virtualization layer (à la vmware) and a multi-platform emulator (à la qemu). If the VM matches the architecture the host is running on, you use the virtualization layer, if it doesn't, you use the emulator.

As for moving between AMD64 and Intel 64 (for example), the VM has to emulate the few instructions that differ and virtualize the rest.

Of course, cross-architecture migration is not that useful since you have an emulation penalty. It is much simpler (and cheaper) to do everything on x64.

Re:Pfff... by Anonymous Coward · 2008-11-07 04:52 · Score: 3, Insightful

so easy that you did it yourself three years ago, right?
Re:Pfff... by Anonymous Coward · 2008-11-07 06:03 · Score: 0

It is much simpler (and cheaper) to do everything on x64.
If you're starting from scratch, yes, a homogenous system is easier to manage. Unfortunately, in the real world most businesses have an eclectic mixture of systems accumulated through aquisitions and mergers and years of running without centralized IT planning.
Re:Pfff... by Anonymous Coward · 2008-11-07 10:18 · Score: 0

4yrs ago, if you must know....

Stability issues are justified by mnmn · 2008-11-07 04:48 · Score: 4, Interesting

Declaration: VMware support engineering here, but speaking strictly on my own behalf.

The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).

We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993

We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault.

If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course.

Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.

--
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky

Re:Stability issues are justified by Anthony+Liguori · 2008-11-07 04:59 · Score: 4, Interesting

Declaration: VMware support engineering here, but speaking strictly on my own behalf.
The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).
KVM goes to great lengths to by default, mask out CPUID features that aren't supported across common platforms. You have to opt-in to those features since they limit a machine's migrate-ability.
However, I won't say this is always safe. In reality, you really don't want to live migrate between anything but identical platforms (including identical processor revisions).
x86 OSes often rely on the TSC for time keeping. If you migrate between different steppings of the same processor even, the TSC calibration that the OS has done is wrong and your time keeping will start to fail. You'll either get really bad drift or potentially see time go backwards (causing a deadlock).
If you're doing a one time migration, it probably won't matter but if you plan on migrating very rapidly (for load balancing or something), I would take a very conservative approach to platform compatibility.
Re:Stability issues are justified by NonSequor · 2008-11-07 05:41 · Score: 1

Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?

--
My only political goal is to see to it that no political party achieves its goals.
Re:Stability issues are justified by Malc · 2008-11-07 05:43 · Score: 1

VMWare have more stability worries than this on their plate. I've just upgraded Fusion on the Mac to version 2 and it's still very unstable. First use the guest OS locked up, forcing me to reboot the host so I could try again, only to find that, like with Fusion 1.1, the Mac hangs on shutdown. *sigh*
Re:Stability issues are justified by kscguru · 2008-11-07 06:09 · Score: 5, Informative

Yet Another VMware engineer here.
The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code still uses CPUID as a serializing instruction that this has noticeable overhead.
So there are two strategies. Pass directly through the CPUID bits (and on the newest processors, apply a mask), or remember a baseline value, trap-and-emulate every CPUID and always return that value. Sounds like KVM has picked the latter approach for a default; VMware's default is to expose the actual processor features and accept a mask as an optional override, which skews towards exposing more features at the expense of some compatibility. Equally valid choices, IMHO.
The Worst Case Scenario when not doing a trap-and-emulate of every CPUID is an app that does CPUID, reads the vendor string, then decides based on the vendor string which other CPUID leafs to read. (Like the 0x80000000 leafs, which are vendor-specific and would come back as gibberish if you get the processor wrong). If the app migrates during the dozen or so instructions between the first CPUID and the following ones, instant corruption. Good enough for a pretty demo, destined to make a guest kernel die a few times a year if actually used in production. And I'm 95% sure this is what the OP demo is doing - living dangerously by hoping mismatched CPUID results never get noticed.
I agree with Anthony Liguori here - on a production machine, an Intel/AMD migration is way too much of a stupid risk. All you have to do is reboot the VM, it's much safer.
(As a side note to everyone reading, the reason Linux timekeeping is such a problem is that TSC issue. Intel long ago stated TSC was NOT supposed to be used as a timesource. Linux kernel folks ignored the warning, made non-virtualizable assumptions, and today are in a world of hurt for timekeeping in a VM. And only now, many years later, are patching the kernel to detect hypervisors to work around the problem.)

--
A witty [sig] proves nothing. --Voltaire
Re:Stability issues are justified by Chirs · 2008-11-07 06:39 · Score: 3, Interesting

The TSC is an optional clock source. You can use other things (ACPI, HPET) but the problem is that they're relatively expensive to access.
The kernel people have been complaining literally for multiple years that x86 needs a system-wide clocksource that is cheap to access (and presumably hypervisor-friendly). So far AMD and Intel haven't bothered to provide one.
Re:Stability issues are justified by Anonymous Coward · 2008-11-07 07:06 · Score: 0

Declaration: VMware support engineering here, but speaking strictly on my own behalf. The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc). We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993 We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault. If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course. Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.
The question is why would your company say it's impossible when it isn't?
Re:Stability issues are justified by TheLink · 2008-11-07 07:19 · Score: 2, Interesting

Yes you're not supposed to use TSC.

BUT there is no good alternative that's:
1) Cheap
2) Fast
3) Available on most platforms

I find it quite amazing actually that the CPU manufacturers add all those features, and yet after so many years there is still no good standard way to "get time", despite lots of programs needing to do it.
--
- Too many replies beneath your current threshold
Re:Stability issues are justified by virtualboy · 2008-11-07 07:28 · Score: 1

Virtual Iron Engineer You also have to be worried about programs that check for CPU and use specific functions within that CPU. When you then move to other CPU that don't have the functions the OS may stay up and running but the application may crash. In house we have done this, but don't recommend customer to LiveMigrate from between Intel and AMD.
Re:Stability issues are justified by Anthony+Liguori · 2008-11-07 07:34 · Score: 2, Informative

The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code still uses CPUID as a serializing instruction that this has noticeable overhead.
Modern OSes do not use CPUID for serialization. We trap CPUID unconditionally in KVM and have not observed a performance problem because of it. Older OSes did this but I'm not aware of a modern one.
My understanding of the reason for the recent CPUID "masking" support is because if you are not using VT/SVM (Xen PV or VMware JIT), there is no way to trap CPUID when it's executed from userspace. AMD just happened to have this feature so when Intel announced "FlexMigration", they were able to just document it. I don't think it's really all that useful though.
(As a side note to everyone reading, the reason Linux timekeeping is such a problem is that TSC issue. Intel long ago stated TSC was NOT supposed to be used as a timesource. Linux kernel folks ignored the warning, made non-virtualizable assumptions, and today are in a world of hurt for timekeeping in a VM. And only now, many years later, are patching the kernel to detect hypervisors to work around the problem.)
The TSC is often used as a secondary time source, even outside of Linux, but yes, Linux is the major problem. But Windows it not without it's own faults wrt time keeping. Dealing with missed timer ticks for Windows guests is a never ending source of joy. Virtualization isn't the only source of problems here. Certain hardware platforms have had overzealous SMM routines and the results was really bad time drift when running Windows.
Re:Stability issues are justified by Anthony+Liguori · 2008-11-07 07:36 · Score: 1

Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?
The halting problem?
Re:Stability issues are justified by Chris+Snook · 2008-11-07 09:16 · Score: 1

Rebooting isn't always an option. If you've got 10 guests running on a host, and you have the luxury of rebooting 9 of them, you still need to migrate one of them. Sure, you can keep separate pools of hosts with different processor revisions and migrate between them most of the time, but what happens when it's time to retire your rack full of netburst-era Xeon boxes, running several hundred guests? You're correct that CPUID trapping introduces overhead on older CPUs, but this demo was run on new CPUs, in part to show off how they make it easier to migrate to.
TSC timekeeping is essential for SMP scalability. When your hypervisor only supports 4-cpu scalability, you may not notice this effect, but for those of us running on bare metal or other hypervisors that allow us to use more CPUs, the effect becomes quite pronounced when running enterprise transactional workloads. The Linux kernel has gone to great lengths to use the most efficient timekeeping mechanism that can be used safely. The only patches I've seen lately on this topic have been to *enable* TSC timekeeping when running under VMware, since Linux distrusts the TSC by default, and has trouble verifying it in a virtualized environment.

--
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Re:Stability issues are justified by Anonymous Coward · 2008-11-07 10:07 · Score: 0

Can you clarify why even changing a stepping makes it unstable? After all, the OS would work fine on those processors without any code modifications if it were running natively (in that it handles the instruction speed difference). Heck, computers even deal with CPU scaling. Can you calrify this please?
Re:Stability issues are justified by Anonymous Coward · 2008-11-07 10:10 · Score: 0

Yet Another VMware engineer here.
Shouldn't both of you be working?
Re:Stability issues are justified by Anonymous Coward · 2008-11-07 13:26 · Score: 0

You are making quite a few assumptions about how KVM works based on how your VMware product works. KVM uses SVM and VT-x and intercepts every cpuid instruction whether they are originating from guest kernels or the guest user address spaces. Further more, since KVM heavily relies on qemu's infrastructure which always claims to a guest that it is an Intel Pentium II machine in 32-bit mode and an AMD64 in 64-bit mode it has a lot of those cross vendor kinks already worked out. The guest OSes *and* applications see the same CPUID values, independent of the platform they are running on and they base their instruction set selection on that. This value does not change during the runtime of the VM. It is independent of migration. As long as both CPUs support the same baseline instruction set you can migrate.
VMware however does not intercept all cpuids. It can't because binary translation only applies to priviledged code (the kernel). VMware doesn't translate user programs and therefore cannot intercept all cpuids. This leads to the inconsistencies in applications you describe. Both Intel and AMD introduced a capability to mask some of the cpuid values to support VMware's enhanced migration but this is a far cry from completely spoofing cpuid's like kvm does.
That said, there are definitely dragons here but most of those are the same for migration between CPUs from the same vendor which is support by VMware. Floating points are an interesting one. Timing is another hard one (someone already alluded to VMware's migration challenges between the same CPU with different steppings). However, as VMware showed, with a reasonable amount of engineering you can do the validation and make it work. Why isn't the same true for cross vendor CPU migration? I can see some obvious issues around sysenter in compatibility mode, some of the bit set instructions and FPU approximation series but none of those seem insurmountable technical challenges.
It is interesting that you suggest that a reboot because it is much safer. I don't see that advise for extended migration which runs into many of the same issues. But this discussion is missing the bigger picture. Life migration of VMs leads to platform lock-in because you can no longer switch platforms for a running VM. Isn't this something we all should work on to avoid? IMHO a discussion about how to resolve the challenges in hardware or software is what we need to have.
Re:Stability issues are justified by Anonymous Coward · 2008-11-07 13:29 · Score: 0

That's a problem with the host OS. Try using a real one next time.
Re:Stability issues are justified by ultranova · 2008-11-07 21:03 · Score: 1

Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?

The halting problem?

The parent is not talking about a general algorithm which could determine whether any other algorithm halts, he's talking about translating the state of an algorithm from one specific approximation of a Turing machine to another. These problems have nothing to do with one another.
Besides, the halting problem only applies to an actual theoretical Turing machine, which has an infinite storage available. Solving the halting problem for a finite-storage Turing machine approximation is trivial: simply trace the algorithm and compare each state it reaches with each previous state. If the states are identical, then the algorithm will never terminate, for it has entered an infinite loop. If a machine has, for example, only 2^32 bytes of storage available, then it only has 2^(32*8) different possible states, and is thus guaranteed to either exit or repeat a state after a finite number (2^(32*8)) of state transformations. A true infinite-storage Turing machine has infinite number of possible states, so it is not possible to use this approach with it.
Finally, even in a true Turing machine, the halting problem does not state that you can't reason about a particular algorithm; it simply states that you can't write an algorithm which automatically proves for an arbitrary algorithm and input that it'll halt.
I wonder if the abuse of the halting problem will ever end ?

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Stability issues are justified by Anonymous Coward · 2008-11-07 23:19 · Score: 0

Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.
Of course you don't want to do that. What you (as in VMware) DO want to do is make sure people use your products rather than your competitors'.
So when competitor X shows technology Y that VMware doesn't have (and, one assumes, isn't working on), OF COURSE you have to claim that technology Y doesn't work - at least, that it doesn't work reliably in practice.
That's not because you dislike technology Y, though, it's because you don't want your customers to switch to competitor X over technology Y.
In other words, it's FUD. Understandable FUD from your perspective, of course, but still.
Re:Stability issues are justified by dkf · 2008-11-07 23:42 · Score: 1

Solving the halting problem for a finite-storage Turing machine approximation is trivial: simply trace the algorithm and compare each state it reaches with each previous state. If the states are identical, then the algorithm will never terminate, for it has entered an infinite loop.
This is the sort of "trivial" that is only used by pure mathematicians. In practice, it's much harder than that, especially when modeling any program with non-determinism in it (almost all of them these days).
The problem is two-fold. Firstly, working out when two states are the same is a much harder challenge than it first appears: e.g. does it matter that the system clock has advanced in the meantime? Secondly, the state space grows massively fast and storing all the states of even a small program tends to lead to a memory structure that is very inefficient (you get a fully random access memory pattern across data that is too large even to fit on a substantial modern cluster...) You can use the solution to the first problem to help deal with the second, but state comparison and compression algorithms are tricky to write in a way that isn't very specific to the program being analyzed.
I say this as someone who has (a few years ago now) written deadlock checkers and temporal logic model checkers for real programs. Yes, the problem is actually parallelizable, but it's still very hard because of that dirty great store of all visited states; BDDs can help, but are horribly dependent on the order of variables and I never knew of an algorithm for determining an optimal ordering...

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Stability issues are justified by kscguru · 2008-11-08 06:25 · Score: 1

VMware however does not intercept all cpuids. It can't because binary translation only applies to priviledged code (the kernel). VMware doesn't translate user programs and therefore cannot intercept all cpuids. This leads to the inconsistencies in applications you describe. Both Intel and AMD introduced a capability to mask some of the cpuid values to support VMware's enhanced migration but this is a far cry from completely spoofing cpuid's like kvm does.
And here I thought VMware employees were experts on how VMware software works!
You've actually run afoul of an extremely common misconception. VMware has been using VT (the same thing KVM uses) since 2005; the VMware hypervisors can run in either a binary translation mode, a VT/SVM mode, or a paravirtualized mode for Linux kernels 2.6.23 and above (or Ubuntu, who accepted the patches earlier), and do in fact switch modes depending on which guest OS, vMotion options, and other settings are configured. Configuring for a baseline CPUID value is ultimately an engineering choice: BT can only run with a passed-through CPUID, whereas VT/SVM can run either passed-through or emulated. Since the trapping overheads of most pre-EPT/NPT VT/SVM implentations are higher than the binary translation overheads, it's more efficient to run in BT mode (but VT mode is still very much supported). Thus, VMware defaults to not spoofing CPUID for a small performance win. For KVM, VT is the only option and adding the additional CPUID is a much lower marginal cost, so it makes engineering sense to always spoof. And both VMware and KVM folks are looking forward to the EPT/NPT future where VT overheads finally become lower than binary translation overheads.
Why isn't the same true for cross vendor CPU migration? I can see some obvious issues around sysenter in compatibility mode, some of the bit set instructions and FPU approximation series but none of those seem insurmountable technical challenges.
It's not insurmountable. See this VMware customer, who tweaked the vmotion compatibility settings enough to get Intel/AMD VMotions working two months ago. There's a world of difference between somebody doing this for fun / in RedHat's research lab and somebody calling this stable enough to use for production servers, however. Does VMware support it in that the software can be made to do it? Yes. Does VMware support it in that tech support will answer the phone if you break something trying this? No.

--
A witty [sig] proves nothing. --Voltaire
Re:Stability issues are justified by ultranova · 2008-11-08 06:37 · Score: 1

This is the sort of "trivial" that is only used by pure mathematicians. In practice, it's much harder than that, especially when modeling any program with non-determinism in it (almost all of them these days).

A Turing machine is completely deterministic - that is, for the given input and algorithm, it will always return the same result, or never return. A non-deterministic program is not a Turing machine, and as such the halting problem doesn't apply.
Of course it might still be impossible to determine whether such a program returns or not - in fact it likely is, since that's what "non-deterministic" strongly implies, but that has nothing to do with "the" halting problem.

The problem is two-fold. Firstly, working out when two states are the same is a much harder challenge than it first appears: e.g. does it matter that the system clock has advanced in the meantime?

Depends on the program, obviously. The simple method I outlined tracks all internal storage, including any and all clock counters, as part of state.

Secondly, the state space grows massively fast and storing all the states of even a small program tends to lead to a memory structure that is very inefficient (you get a fully random access memory pattern across data that is too large even to fit on a substantial modern cluster...)

Of course. I never said it was a practical solution. I simply pointed out that the halting problem is only unsolvable given infinite storage (translating into infinitely many different states) for the target machine.

I say this as someone who has (a few years ago now) written deadlock checkers and temporal logic model checkers for real programs.

From what I've understood, a deadlock can only occur if, for any pair of locks, they can be acquired in either order, and not otherwise. This, then, seems to suggest that they are simple to avoid: just put all the locks in your program into an ordered list and make sure that no thread can ever attempt to get a "lesser" lock than the highest-numbered one it currently has. Maybe make a custom locking function which keeps track of the highest lock number, and terminates the thread (or just returns an error) and complains loudly into the log if locking a lower-numbered lock is attempted ?
You could also base the checking algorithm on this: just make sure there are never in any code path any inconsistencies in the locking order, and it should work fine.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

I doubt VMWare is scared...yet by Anonymous Coward · 2008-11-07 04:48 · Score: 0

Not to diss the acheivement which is cool. It does require newer processors with the special VM extensions. So it may commoditze future CPUS. Also KVM requires QEMU and many running VMWARE depend on a tested solution that is delivered complete from the vendor. And VMWARE is propably looking at the source and will have it or something similar in future builds.

Um by Colin+Smith · 2008-11-07 04:49 · Score: 1, Insightful

The VM software vendor becomes "the major player".

As The Who's so insightfully titled song said "Meet the new boss. Same as the old boss."

--
Deleted

Re:Um by Korin43 · 2008-11-07 07:04 · Score: 4, Insightful

Except they're doing it with KVM, which is open source..
Re:Um by Bearhouse · 2008-11-07 08:41 · Score: 1

Damn. Please someone re-write that Wiki entry to make it more friendly for our non-tech friends. It starts...
"Kernel-based Virtual Machine (KVM) is a Linux kernel virtualization infrastructure. KVM currently supports native virtualization using Intel VT or AMD-V. Limited support for paravirtualization is also available for Linux guests and Windows in the form of a paravirtual network driver[1], a balloon driver to affect operation of the guest virtual memory manager[2], and CPU optimization for Linux guests. KVM is currently implemented as a loadable kernel module although future versions will likely use a system call interface and be integrated directly into the kernel[3].
Architecture ports are currently being developed for s390[4], PowerPC[5], and IA64. The first version of KVM was included in Linux 2.6.20 (February 2007)[6]. KVM has also been ported to FreeBSD as a loadable kernel module[7]."
Re:Um by abdulla · 2008-11-07 12:07 · Score: 3, Insightful

Why should it be dumbed down? I don't go reading Biology articles and expect to know everything. That's why there are links to other articles explaining each bit in more detail.
Re:Um by Anonymous Coward · 2008-11-07 23:09 · Score: 0

which is open source..
Doesn't mean it can't be a major player, though. Case in point: wouldn't you agree that Apache is a major player in the http server market?

Wasn't this always possible? by tlhIngan · 2008-11-07 04:49 · Score: 1

The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed). Nor how it cna be impossible - while the x86 has many extensions, it's still a well-specified architecture with specific behaviors.

The real trick is if an application is using features not present on the other architecture - e.g., an AMD virtual machine migrating to an Intel one while running applications use 3DNow instructions (which don't exist on Intel CPUs). Or perhaps an old 16-bit application running on a 32-bit VM under a 32-bit OS migrating to a 64-bit VM (since you can't do real mode or other legacy things in x64 mode) and continuing without a hitch... (Maybe it's a VM running MS-DOS, say?)

Re:Wasn't this always possible? by Anonymous Coward · 2008-11-07 05:32 · Score: 0

Or perhaps an old 16-bit application running on a 32-bit VM under a 32-bit OS migrating to a 64-bit VM (since you can't do real mode or other legacy things in x64 mode) and continuing without a hitch... (Maybe it's a VM running MS-DOS, say?)
This one, everybody has already solved. Or you couldn't boot past BIOS - the last refuge of real-mode code in today's computers.
Re:Wasn't this always possible? by thePowerOfGrayskull · 2008-11-07 05:33 · Score: 1

The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).
Erm... actually, if you watch the video, you will see that the "live" migration is actually live - the VM is not suspended, it is kept running and active through the migration.
Re:Wasn't this always possible? by Anonymous Coward · 2008-11-07 05:55 · Score: 0

I would say "the point" of virtualization is subjective, and that may be your purpose but other people have different purposes. My point is to run multiple operating systems from the same dev machine, reducing hardware cost. Some people use it for the sake of redundancy and infrastructure management on mirrored hardware. VMWare Fusion users use it to have the best of both worlds hand in hand. I would say "the point" of virtualization is that it is very useful, for many reasons, and that with pros come cons.
Completely isolating the hardware from the software has downsides too; most notably speed. There has been architecture emulation for quite some time with completely isolated infrastructure. The answer to speed concerns was to improve hardware support for virtualization, which both dominant manufacturers have implemented. Now VMWare and other systems can issue instructions to the processor specific to managing allocated zones of hardware and then they can pass the instructions directly through to the processor rather than interpreting them with a virtual processor (effectively a Hardware Hypervisor). You could probably effectively move from inferior hardware to superior hardware as either isolated or integrated so long as you proxy some of the messages and are aware of the expectations of the guest OS, but it's a lot of work to accomplish and it's very dependent on the new machine being able to, while proxying, perform at least as well as the VM required of the old hardware.
Re:Wasn't this always possible? by Ephemeriis · 2008-11-07 06:09 · Score: 1

I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).
You just completely missed the point. The VM was not suspended, moved, and resumed. It was moved live. The VM never stopped doing its thing. It was up, running, and servicing requests the whole time.
...which isn't terribly amazing. I know VMWare can do that now. The big deal is apparently that it moved from one CPU vendor to another. I didn't realize this was so tricky... I kind of figured that x86 was x86 regardless of vendor. Obviously, I was wrong.

--
"Work is the curse of the drinking classes." -Oscar Wilde
Re:Wasn't this always possible? by TheRaven64 · 2008-11-07 06:09 · Score: 2, Informative

Actually, it is suspended, but only for a fraction of a second. First you copy the entire contents of memory to the new machine and mark it as read-only. Each page fault caused by this is used to mark pages that are still dirty. Then you copy these. You keep repeating this process until the set of dirty pages is very small. Then you suspend the VM, copy the dirty pages, and start the VM on the new machine. Userspace programs will just notice that they went an unusually long time without their scheduling quantum. With Xen, at least, the kernel is responsible for bringing up and shutting down all CPUs except the first one, so the kernel will notice the migration (in a paravirtualised kernel - with HVM it won't) and restart the other (virtual) CPUs.

--
I am TheRaven on Soylent News
Re:Wasn't this always possible? by BitZtream · 2008-11-07 08:48 · Score: 1

The virtual machine was paused, just not very long. At some point you have to transfer the contents of the VMs ram between the servers running it and swap which hardware owns the virtual disk. When that moment occurs, the virtual machine is paused for a brief period of time while the final bits of memory and ownership of disks is transfered to the new host.
This pause is mitigated by transfering as much of the running VM's RAM to the new host as possible, then when the move actually occurs, copying those last changed bits over. On certain servers, where ram is changing constantly and there is a lot of it, you will see a very obvious pause in the virtual machine during migration. Smaller amounts of ram, not so much.
Don't be impressed by the appearent 'lack of pause during transfer' because it was there, they just made sure the test was done in such a way that they could demonstrate it without you noticing it. Thats the advantage of setting up your own demos, you can hide all the bad parts pretty easy.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

Doesn't surprise me by guruevi · 2008-11-07 04:52 · Score: 1

After all, all x86 are the same. MMX extensions get emulated on AMD, Linux distro's run on both processors without recompiling, the kernel handles calls and most likely an Apache server is not going to call the special media extensions. It would be interesting to see this happen in an environment that has been optimized and is using certain incompatible extensions (like 3DNow!) eg. a computing cluster.

If you abstract enough and emulate a processor you should even be able to move between architectures but the overhead of emulation wouldn't make it very cost effective.

--
Custom electronics and digital signage for your business: www.evcircuits.com

Re:Doesn't surprise me by jamesh · 2008-11-07 15:37 · Score: 1

MMX extensions get emulated on AMD
Yes, but the kernel has to detect this, and once it does, it assumes that it doesn't keep having to re-detect this (afterall, why would it change?). And what if you migrated right in the middle of that detection?
With PV kernels that understand virtualisation it's probably not a big deal because the kernel can just say "don't migrate while i'm doing this", but for fully virtualisation where the kernel doesn't know it's virtualised, it's a bit harder.

Not quite a break through by Anthony+Liguori · 2008-11-07 05:01 · Score: 2, Insightful

FWIW, KVM live migration has been capable of this for a long time now.

KVM actually supported live migration of Windows guest long before Xen did. If you haven't given KVM a try, you should!

Xen does migration, but not Live... by LinuxGeek · 2008-11-07 05:04 · Score: 4, Informative

This is a demo of a Live migration, no shutdown or reboot involved. Xen does not support the live migration of a running VM between an AMD and Intel server. Watch the video, they are running a video in the VM that keeps playing during the migration. Very impressive stuff.

--

Kindness is the language which the deaf can hear and the blind can see. - Mark Twain

Re:Xen does migration, but not Live... by Anonymous Coward · 2008-11-07 05:49 · Score: 0

There is no difference between live migration and save\restore in this regard.
So yes, you can do the same demo now using Xen 3.3.
Re:Xen does migration, but not Live... by Anonymous Coward · 2008-11-07 06:49 · Score: 0

Maybe Xen doesn't officially support cross-platform live migration, but I did live migration using Xen over two years ago (I think on Fedora Core 5!) back and forth between a 32 bit Intel CPU and an Opteron. And it was live, very live. I preserved several ssh sessions, and several cpu-intensive tasks.

Creds anyway by noundi · 2008-11-07 05:09 · Score: 2, Insightful

It's worth noting that VMware have been a huge contribution to the Linux-society, giving corps a very good reason (â$Â£) to migrate, thus including important pawns in the future of Linux. I for one believe that VMware was wrong, but that it's an honest mistake. There's no use in poking on VMware for this one, hopefully they'll help lift the technology even higher along with their competitors.

You've lost this round VMware, but the match isn't over yet!

--
I am the lawn!

AMD by wzinc · 2008-11-07 05:18 · Score: 1

I don't know if this will help AMD sell more procs. I like AMD, but Intel's stuff is by far faster these days. Still, Intel's procs are nightmarishly expensive compared to AMD, and the difference in price/performance seems disproportionate to me.

AMD ftw... by Anonymous Coward · 2008-11-07 06:43 · Score: 0

Shows that AMD is the better company. Intel just buys and kills everything in it's way with it's evil black market and under the counter deals...

OpenVZ has been able to do it for like 2 years now by dowdle · 2008-11-07 08:10 · Score: 1

Let me clarify before people jump down my throat... OpenVZ (www.openvz.org) is OS Virtualization (aka containers) and NOT machine / hardware virtualization... so it can only run Linux on Linux... but it has been able to do live migrations from one processor family to another since they initially added checkpointing. OpenVZ is fairly CPU agnostic and it has been ported to a number of CPU families. In fact the project leader recently ported it to ARM (Gumstix Overo). See: http://community.livejournal.com/openvz/24651.html

--
Scott Dowdle
www.MontanaLinux.Org

I see a much harder problem... by Osvaldo+Doederlein · 2008-11-07 08:20 · Score: 1

This migration won't work for systems that employ advanced JIT code generation, such as Java. Modern production JVMs, like Sun's and IBM's, will create native code on the fly - and they will produce code that's ultra tuned for the specific processor that is running. This means using the best instructions available (like SSEx), and also fine-tune various behaviors, e.g. GC can be tuned for the L1/L2 cache sizes, and locking can be tuned to factors like number of CPUs/cores/hardware threads - so for example, if it's running on a uniprocessor/single-core machine, the JVM will simply not emit memory barrier instructions for memory model consistency.

And it's not only Java, we have an increasing large number of JIT compilers that may employ similar tricks: Microsoft .NET (CLR); Flash 9+ (Tamarin) for ActionScript; Mozilla TraceMonkey and Google V8 for JavaScript; new LLVM-based runtimes for other languages... the list is only growing. Even for traditional static-compiled languages, some apps can have multiple shared libs compiler for different CPU levels, and choose the best lib at startup.

The only way I see around this problem is making ALL these runtimes and applications migration-aware. Each process should be notified before the migration, initiate some pre-migration task, and after the migration, being notified again to resume work and if necessary perform some post-migration step. Specifically for Java, the pre-migration would need to "park" all threads in OSR safepoints, then free all JIT-generated code; and in the after-migration, retune/config itself for the new CPU, then unpark the threads - that would resume execution in interpreted mode until the JIT compiler recreates all native code for the new CPU. Fortunately this is relatively simple to do in JVMs, because all necessary plumbing is preexisting (safepoints, on-stack replacement... required for advanced GC and dynamic optimizations). And once a new JVMs are enhanced with this feature, thousands of Java apps become magically migration-aware. Could be harder though for other runtimes.

Still, very hot technology, just not as easy as we can imagine to get right and compatible with all applications.

Re:I see a much harder problem... by BitZtream · 2008-11-07 08:39 · Score: 1

The apps already are migration aware, as are the OSes, thats why you reboot them.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

Once again... by emptycorp · 2008-11-07 08:45 · Score: 1

AMD is the first to technological breakthrough and all Intel can do is copy the technology and overclock it to do better on benchmarks.

AMD - First to create lower clock speeds with same or better performance to Intel's higher speeds.

AMD - First (and only) to TRUE dual and quad core technology (Intel does not use logical cores).

AMD - First to 64-bit.

Of course other smaller chip makers have done these sorts of things first, but they don't compare to the Intel/AMD dominance and consumer marketplace.

Re:Once again... by Anonymous Coward · 2008-11-07 15:44 · Score: 0

Nobody cares who did what when and who copied who. We care about which is cheaper, more reliable, faster and provides best battery life.
Nobody cares about migrating running VMs across different processors from different vendors. There is simply little to no reason to care/do such things.
Either it works because your VM software sucks and emulates a naieve environment without taking advantage of advanced processor features unique to each processor or it necessarily emulates them slowing the system down.. Either way WTF cares?
VM's are not 'cool' they are a waste of system resources and exist mearly as a bandaid to workaround underlying operating systems not being engineered to provide the level of isolation required by their users.

Re:Stability issues are (not) justified by Anonymous Coward · 2008-11-07 09:26 · Score: 0

I've seen this done before by masking the cpu flags so as the VM only sees the lowest common denomination of features of a group of CPUs across which it can migrate.

VMWare have been unable to make this feature stable in their product while others like the commercial products based on KVM and also Citrix Xen have managed to get this working to a level where they are confident enough to do live demos (rather than slideware). There was a video of this linked over on the 360is blog last week.

AG

Intel will be fixing this problem soon by Anonymous Coward · 2008-11-07 11:50 · Score: 0

Look for Intel to provide "Intel Genuine Advantage" that makes it impossible to migrate a VM, under any circumstances, with any degree of success.

A BitchSlappin by Anonymous Coward · 2008-11-07 15:31 · Score: 0

I'm not saying who, but this is a big old badazz Bitch slap to a large proprietary software vendor famous for locking customers in, and feeding them anything it wishes, including animated paperclips and other stupid junk. Suddenly their protected world with its high walls are looking like an open beach with the tide rolling over walls made of sand.

What about apps, not servers? by JamesTRexx · 2008-11-07 22:44 · Score: 1

With all the talk about virtualization in the last couple of years, I'm a bit surprised that I haven't seen major talks about live migration capabilities at application level.
I'm not talking about cluster capable apps, but being able to run an app on one server, and then migrate it.
Even the capability of FreeBSD jails, Solaris containers, OpenVZ, etc. to migrate live would come closer to live apps migration.

There's always a good reason to virtualize at OS level, but ultimately it only comes down to being able to run the application that you need.

--
home

Whats New Here? by keean · 2008-11-07 23:01 · Score: 1

My company have been successfully migrating VMs from 32bit Intel to 64bit AMD to 64bit Intel for years. We use Linux VServers and OpenVZ. This shared kernel approach to virtualisation is much lower overhead than VMWare, Xen or KVM. We can even run different distro's inside the VMs, the only limitiation is that all the VMs see the same Linux kernel version. So whilst we haven't done this with a hypervisor style VM, for what we want (migrating server images between physical hosts, backing up server images) Linux VServers/OpenVZ is a much better choice anyway.

Mod parent down by Anonymous Coward · 2008-11-10 03:46 · Score: 0

nt

RAID-VM by Janek+Kozicki · 2008-11-11 01:32 · Score: 1

Redundant Array of I...Intercommunicating D....Devices of Virtual Machines.

It is obvious that the next step is to set up several servers with Virtual Machines on them. Run the same VM in parallel on one or more of them. And if one of the servers goes down - the end user will not notice this, because his virtual machine was be mirrored on all those other servers. Just like hotswap in RAID HDDs we will have this capability with Virtual Machines. It's just a matter of time.

And if someone is stupid enough to try to patent this - he can't because it's blatantly obvious, and I doubt that I am the first person the present this idea.

--
# #\ @ ? Colonize Mars #

Slashdot Mirror

Red Hat & AMD Demo Live VM Migration Across CPU Vendors

134 comments