Slashdot Mirror


Ask Slashdot: How do Software MMU's Work?

Rob_D_Clark asks: "How does a program (like VMware) implement memory management on top of Linux (or Unix in general)? For example: in VMware, the guest OS is going to expect to have a 32-bit address space, into which the memory you allocate to the guest OS is mapped. Also, the guest OS is going to expect hardware registers for different devices, etc., to be mapped in at certain addresses. How does a program trap reads/writes to these addresses and deal with them appropriately?"

12 of 92 comments (clear)

  1. Re: Good question!; incomplete VM in X86? by Anonymous Coward · · Score: 2

    It's been >5 years since I last studied this, but I thought that one could set a flag that allows even debug registers to be "protected", in effect causing a fault upon *any* access attempt by ring 3 ("user") code.

    My current favorite example comes from Linux: The kernel allows user processes to read the current value of the CPU clock counter (using the instruction "rdtsc", or "read time stamp counter"). That instruction can be made to cause a fault by an appropriate flag setting.

    I would expect Intel to be fairly good at VM technology after hearing some of the complaints about the '386. (The obvious one is the lack of ring 0 write-protect page faults.)

  2. Funky Walking by Anonymous Coward · · Score: 2
    It might be useful to do some research on Rational Purify (formerly of Pure-Atria, formerly Pure Software). Purify lets the programmer/debugger trap on reads of un-initialized memory and provides fast watch-points (100s of times faster than gdb's watch-points). So clearly Purify is doing exactly what you want (while it isn't really clear if VMware is).

    I'd dig around in deja-news if I were you.

  3. Re:Ass-talking by Anonymous Coward · · Score: 2

    No, it's simpler than that. Read the linux-kernel archives and see how the UltraSparc guys discussed working around the bugs in the UltraSparc CPUs:

    (1) you mark all the pages that you want to trap instructions in as non-executable

    (2) when code attempts to execute in one of those pages, you get a fault

    (3) you trap the fault, and then (and only then) scan the page and modify instructions as necessary

    (4) you then mark the page executable and not writable, and let it run

    (5) if the page is modified, you then clear the executable bit, because you may have to re-scan it.

  4. Here's how it works. by Anonymous Coward · · Score: 4


    Okay, imagine that the memory of your computer is like a vast attic, full of flies. Each of the flies is either asleep or awake, and they change state frequently. They live, work, and play in groups of eight, called "bytes". Now, when the computer gets hungry, it opens up its mouth much like a blue whale and sucks in a great big gulp of air from the attic. It filters the flies out of the air with its giant long strandy teeth and gobbles up the flies -- gobble gulp!

    So.

    The whale has no eyes, and in the whale's tummy there is a man without his greatcoat. That guy is called the "kernel", or "Colonel", and he looks and talks exactly like Colonel Klink on Hogan's Heroes. He has a goofy, bumbling sidekick named Sergeant Shultz, otherwise known as the "Memory Mangement Unit". What Sgt. Shultz does is, well . . . okay. Let's start over. Colonel Klink is in charge of sorting through these flies and putting them together in the right order before the whale (the computer, remember) digests them. This way, the whale won't get a tummyache and feel funny. Col. Klink has to decide which flies to send when, but he needs to have them organized in the right way so he knows which flies are which. If two batches of flies crash into each other, the computer will get very frowny and sad. Col. Klink doesn't like that, because when that happens the General comes and yells at him in German, and Col. Klink doesn't speak German, he just speaks English with a funny accent. So Sgt. Shultz has the very important job of ensuring that the flies don't get mixed up before Col. Klink gets to look at them.


    In the arrangement that you're talking about above, things are more complex, because Col. Klink and Sgt. Shultz have to coexist with Col. Hogan and Richard Dawson, who are doing the same thing at the same time. (A little imagination will suffice to guess which OS is which). Hilarity ensues! But everything runs smoothly again at the end of the episode.


    Hope this helps.

    1. Re:Here's how it works. by nrrd · · Score: 4

      One of the best descriptions of file paging is here. It give a fairly solid, humorous description. Long live the Thing King!

      --
      "Eye halve a spelling chequer, It came with my pea sea, It plainly marques four my revue, Miss steaks eye kin knot sea"
  5. Ass-talking by Anonymous Coward · · Score: 4
    Since everyone else is talking out of their ass on this one, I might as well too.

    The general consensus in comp.arch is that vmware is doing some dynamic recompilation, but is otherwise allowing the hosted operating system to execute natively, and thus use the hardware mmu for the majority of the work.

    As has already been mentioned, the IA32 instruction set architecture (ISA) is not completely self-virtualizable, i.e. you can't trap accesses to all cpu state information. But, you can scan through the text of your process and search for those specific opcodes that are not virtualizable. Substitute a call to your own handler for those opcodes and voila! we are now effectively fully virtualizable and the performance hit is minimal, especially if you can save your changes so that you don't have to scan and recompile each page of text more than once. And once you are fully virtualized, as long as you properly trap the right operations and do the right thing, you can let the hardware do 99% of the work for you.

    Clearly vmware does more than this with its various virtualized devices, but fundamentally this is probably what is going on.

    1. Re:Ass-talking by synaptik · · Score: 2

      This is where the processor comes to the rescue. To be sure, they can't expect every occurence of a certain sequence to be a particular instruction. It's quite possible to have that sequence be a combination of the end of one instruction and the beginning of another.

      I'm no OS developer, but if I were trying to do this, I'd try scanning for these strings, and then placing a hardware execution breakpoint at the beginning of them. If it's not actually code, the breakpoint won't get hit. If it is code, then when it does get hit the VMWare software could just look at the instruction pointer register, to ascertain whether they "hit" in the middle of, or the beginning of, an instruction. If the latter, they simulate that "offending" instruction.

      But like he/you said, I'm talking out of my arse.

      :)

      --synaptik

      --
      HSJ$$*&#^!#+++ATH0
      NO CARRIER
  6. Good question! by synaptik · · Score: 2

    Wow, I'm glad Rob Clark thought to ask this on Ask Slashdot. I was wondering this myself.

    Although, I would like to add a rider to his question:

    With Intel processors, some hardware registers can't be trapped. For example, any priviledge level can read DR7 to find out if a debugger is resident. Writes to this can obviously be trapped, but AFAIK there is no way to get the processor to trap on reads.

    I am sure there are other examples like this, as well. This seems to indicated that it is impossible to virtualize every aspect of the machine.

    (Although, I suppose you could put the processor into single-step mode, and look at each instruction before it executes, looking for these types of instructions, but that would slow things WAAAYYYY down.



    --synaptik

    --
    HSJ$$*&#^!#+++ATH0
    NO CARRIER
  7. Re:Virtual Memory by Rob_D_Clark · · Score: 2

    What I am really trying to figure out is how to trap writes/reads to certain addresses without having to interpret the machine instructions... the best hack I have thought of is to trap SIGSEGV, and have your signal handler try to figure out what was going on.

    For example you can mark pages of memory as a not being readable (PROT_NONE flag for the mmap). This will cause a SIGSEGV if the program tries to read/write that address.

    Another idea I just thought of as I was writing this post... you could use a kernel module to create a /proc file, then have your VM mmap that file to use as it's memory. Then the module could deal with simulating memory space. (This is assuming you can prevent the mmap file from being cached.) This would be even better if you could bind a user-space program to a file. (I vaguely remember reading that GNU hurd has the ability to do this.)

    --
    --Rob
  8. Re:Speculation: Virtualize the x86 by redhog · · Score: 2

    It may do the work, and does, at least for vmware. There is a thing called single-step in an 80x86, which lets you single step a program, with an interrupt after every instruction, so you may easily look-ahead and see if the next would compromize your emulation... This functionality is used by vmware, while they say this functionality is not available to debuggers running inside a guest-os...

    --
    --The knowledge that you are an idiot, is what distinguishes you from one.
  9. Virtual Memory by MasterD · · Score: 5

    It all has to do with virtual memory. (not the misnomer use as swap) Basically, there is a mapping between _real_ memory addresses and the addresses programs use to access data.

    In a kernel, this is done (usually) using a mix of hardware and software. If a program tries to access a piece of memory, the hardware looks at the Transition Lookaside Buffer (TLB) to translate the address. If the address exists in the buffer, it does the transition and all is good. If it does not exist, a trap is called to the kernel. It is the kernel's responsibility to look at the virtual memory tables, allocate the memory, copy it if it was copy on write, and most importantly update the TLB so next time it does not have to set up the translation.

    So in VM case, this is sorta conjecture. The VM can allocate a slew of memory on the host OS. (As far as the client OS is concerned, this is physical RAM. Then it can make a TLB and all memory accesses will go through it first. This way it can stop Windows from pissing all over OS/2 running on the VM. But Linux will stop the VM from pissing on anything else on the host OS.

    As far as kernel traps, the user level program's data needs to be copied over to kernel space for the kernel to access it.

    I hope this begins to answer your question.

  10. technical references by rgrimm · · Score: 5
    The basic technology behin VMware (I suppose -- all three authors are cofounders of VMware) is described in research papers at http://www-flash.stanford.edu:80/Disco/ , if you want to find out the details.

    Robert