Slashdot Mirror


Red Hat & AMD Demo Live VM Migration Across CPU Vendors

An anonymous reader notes an Inquirer story reporting on something of a breakthrough in virtual machine management — a demonstration (not yet a product) of migrating a running virtual machine across CPUs from different vendors (video here). "Red Hat and AMD have just done the so called impossible, and demonstrated VM live migration across CPU architectures. Not only that, they have demonstrated it across CPU vendors, potentially commoditizing server processors. This is quite a feat. Only a few months ago during VMworld, Intel and VMware claimed that this was impossible. Judging by an initial response, VMware is quite irked by this KVM accomplishment and they are pointing to stability concerns. This sound like scaremongering to me ... All the interesting controversy aside, cross-vendor migration is [obviously] a good thing for customers because it avoids platform lock-in."

11 of 134 comments (clear)

  1. Bravo! by Cornwallis · · Score: 4, Funny

    I love to see things like this that give me a greater freedom to migrate off the major players.

  2. This is still unreleased test demo's by Beached · · Score: 4, Insightful

    The real beauty of this will come when the system automatically moves VMs to machines in case of hardware problems or when a system is underutilized. It would let you power down servers during non-peak times and save oodles of cash.

    --
    ---- aut viam inveniam aut faciam
    1. Re:This is still unreleased test demo's by Comatose51 · · Score: 4, Interesting

      You mean like VMware's VMotion, HA, and DRS functionalities?

      --
      EvilCON - Made Famous by /.
    2. Re:This is still unreleased test demo's by TheRaven64 · · Score: 4, Insightful

      They don't seem to have released many details of this. Migrating between x86-with-SSE and x86-without-SSE, for example, is pretty simple - you just need the OS or hypervisor to trap the illegal instruction exception and emulate. Migrating from x86 to x86-64 is pretty easy too - you just don't get any advantages from the 64-bit chip. Going the other way is really hard, and would need the hypervisor to trap the enter-64-bit-mode instruction and emulate everything until the mode was exited (difficult, slow, and probably pointless).

      I read TFA when it first came out and couldn't work out exactly what they were claiming was novel. Migrating between very-slightly-different flavours of x86 is not really that hard. Migrating between ARM and x86 would be incredibly hard - Xen can actually do this with the P2E work (not sure if it ever made it in to trunk), which migrated a VM from real hardware in to QEMU but, again, that's not an ideal solution unless the emulator has traps that userspace can use - for example a Java VM might get a signal after migration, flush its code caches, and re-JIT as x86 code instead of ARM.

      --
      I am TheRaven on Soylent News
  3. This was in all likelyhood faked. by Anonymous Coward · · Score: 5, Funny

    Open source is for morons.

    Only Apple has the engineering know-how and skills to pull of something like this. The fact that they have not done so to date is a clear indication that it is impossible.

  4. check the graphs... by alta · · Score: 5, Interesting

    Go to 4:05 in the video. On the far left, you can see from the blue intel line that the guest is running there, then they migrate, and the blue line goes to the idle point, and the orange line starts taking the load. But NOTICE, the AMD line is consistantly higher than the intel line was. I'm no intel fanboy... or AMD. I have both intel and amd servers in my racks. I just thought it was interesting, and I'm surprised they let the video go out like that.

    --
    Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
  5. Stability issues are justified by mnmn · · Score: 4, Interesting

    Declaration: VMware support engineering here, but speaking strictly on my own behalf.

    The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).

    We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993

    We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault.

    If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course.

    Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.

    --
    "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
    1. Re:Stability issues are justified by Anthony+Liguori · · Score: 4, Interesting

      Declaration: VMware support engineering here, but speaking strictly on my own behalf.

      The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).

      KVM goes to great lengths to by default, mask out CPUID features that aren't supported across common platforms. You have to opt-in to those features since they limit a machine's migrate-ability.

      However, I won't say this is always safe. In reality, you really don't want to live migrate between anything but identical platforms (including identical processor revisions).

      x86 OSes often rely on the TSC for time keeping. If you migrate between different steppings of the same processor even, the TSC calibration that the OS has done is wrong and your time keeping will start to fail. You'll either get really bad drift or potentially see time go backwards (causing a deadlock).

      If you're doing a one time migration, it probably won't matter but if you plan on migrating very rapidly (for load balancing or something), I would take a very conservative approach to platform compatibility.

    2. Re:Stability issues are justified by kscguru · · Score: 5, Informative
      Yet Another VMware engineer here.

      The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code still uses CPUID as a serializing instruction that this has noticeable overhead.

      So there are two strategies. Pass directly through the CPUID bits (and on the newest processors, apply a mask), or remember a baseline value, trap-and-emulate every CPUID and always return that value. Sounds like KVM has picked the latter approach for a default; VMware's default is to expose the actual processor features and accept a mask as an optional override, which skews towards exposing more features at the expense of some compatibility. Equally valid choices, IMHO.

      The Worst Case Scenario when not doing a trap-and-emulate of every CPUID is an app that does CPUID, reads the vendor string, then decides based on the vendor string which other CPUID leafs to read. (Like the 0x80000000 leafs, which are vendor-specific and would come back as gibberish if you get the processor wrong). If the app migrates during the dozen or so instructions between the first CPUID and the following ones, instant corruption. Good enough for a pretty demo, destined to make a guest kernel die a few times a year if actually used in production. And I'm 95% sure this is what the OP demo is doing - living dangerously by hoping mismatched CPUID results never get noticed.

      I agree with Anthony Liguori here - on a production machine, an Intel/AMD migration is way too much of a stupid risk. All you have to do is reboot the VM, it's much safer.

      (As a side note to everyone reading, the reason Linux timekeeping is such a problem is that TSC issue. Intel long ago stated TSC was NOT supposed to be used as a timesource. Linux kernel folks ignored the warning, made non-virtualizable assumptions, and today are in a world of hurt for timekeeping in a VM. And only now, many years later, are patching the kernel to detect hypervisors to work around the problem.)

      --

      A witty [sig] proves nothing. --Voltaire

  6. Xen does migration, but not Live... by LinuxGeek · · Score: 4, Informative

    This is a demo of a Live migration, no shutdown or reboot involved. Xen does not support the live migration of a running VM between an AMD and Intel server. Watch the video, they are running a video in the VM that keeps playing during the migration. Very impressive stuff.

    --

    Kindness is the language which the deaf can hear and the blind can see. - Mark Twain
  7. Re:Um by Korin43 · · Score: 4, Insightful

    Except they're doing it with KVM, which is open source..