Ask Slashdot: How do Software MMU's Work?

Re:Here's how it works. by Anonymous Coward · 1999-05-13 09:57 · Score: 0

Funny, my dad told me that's how an 'internal combustion engine' works.

Re:Virtual Memory by Anonymous Coward · 1999-05-13 09:59 · Score: 0

No... the TLB is really just a hardware cache for the page table (keeps things from being insanely slow). VMWare loads a kernel module, and of course it allocates/deallocates pages of its own. I'm not quite sure how the RegisterMonitor etc. code works, though. As another poster pointed out, some features of the x86 are accessible even from priority 3/user (e.g. flags, low half of machine state word, some debug registers(?)). I'm pretty sure this is done with some VME (virtual mode extensions) trick; this stuff is trapped in appendix H, though.

Instructions I found in vmmon by Anonymous Coward · 1999-05-13 10:18 · Score: 0

Does anyone know what the purpose of the x86 byte sequence:
e8 fc ff ff ff
would be? I see it all over the place in vmmon, and a jump to the fc (= cld) doesn't make much sense, especiall since ff ff isn't reported as a valid instruction by objdump... or is it?

Re:Instructions I found in vmmon by Anonymous Coward · 1999-05-13 11:18 · Score: 0

I know. That means it's a call (not jump, oops) to the instruction starting at the fc, since the
last ff would be -1 offset from the beginning of the next instruction, second-to-last ff -2 offset, etc.
Re:Instructions I found in vmmon by Anonymous Coward · 1999-05-13 18:42 · Score: 0

But wouldn't the ff ff raise an illegal operation exception?
Re:Instructions I found in vmmon by Anonymous Coward · 1999-05-14 01:05 · Score: 0

It's the equivalent of:

here: JMP FAR next
next: ...

Where "here" is the address of the JMP instruction, and "next" is the (32-bit) address of the instruction immediately following the JMP.
Re:Instructions I found in vmmon by Anonymous Coward · 1999-05-14 22:39 · Score: 0

Ok, nevermind. I'm an idiot. That's not exactly what I meant to point out, but it's really irrelevant to the discussion. This whole thread deserves a big fat "-1".
Re:Instructions I found in vmmon by Anonymous Coward · 1999-05-13 12:23 · Score: 1

Its flushing the on-chip cache.
Re:Instructions I found in vmmon by synaptik · 1999-05-13 11:00 · Score: 1

Remember, i386 is little endian:

"fc ff ff ff" is really "0xFFFFFFFC"

or -4.

--synaptik

--
HSJ$$*&#^!#+++ATH0
NO CARRIER

Re:Ass-talking by Anonymous Coward · 1999-05-13 10:21 · Score: 0

Talk about talking out of your ass! So VMware also has a perfect disassembler to figure out where the actual instructions of interest are in the text region?

Re:Ass-talking by Anonymous Coward · 1999-05-13 10:50 · Score: 0

Well, how about this for some funky walking:

If you can differentiate between data and text (by address? I don't know what the various OSes virtual address spaces look like), then you can avoid messing with data.
Must x86 instructions be page-aligned? If so, we only need to "interpret" starting on page boundries, things become a little more complex than simple search and replace, but not much and the effort can be distributed during run time, say only when we take a page-fault, so interpreting 4k at a time won't have too much of impact on subjective speed.

Re:Ass-talking by Anonymous Coward · 1999-05-13 11:09 · Score: 0

1 or 2? Looks more like 2 to me.

If instructions must be page-aligned, or more precisely can't straddle page-boundries, then the interpreter would always see the MOV EAX and be able to recognize that the next 4 bytes are part of the MOV instruction and not a brand new instruction.

Now if instructions can't straddle page-boundries, then pulling off the interpreter is a lot more work.

Re:Ass-talking by Anonymous Coward · 1999-05-13 11:22 · Score: 0

That's not true. Just because an instruction doesn't straddle a page boundary, doesn't mean that a valid instruction has to start on one. Furthermore, there's plenty of code which jumps into the middle of one instruction because it contains another instruction needed, etc.

Re:Here's how it works. by Anonymous Coward · 1999-05-13 11:34 · Score: 0

You know, nothing simplifies a problem like a good Hogans Heroes analogy. Just one question, for the Hogans Heroes fans out there. When Schultz says he knows nothing, do you have to reboot the whole POW camp? Thats one episode I sure missed.

Re:Speculation: Virtualize the x86 by Anonymous Coward · 1999-05-13 12:15 · Score: 0

The problem is that some ring 3 instructions betray the fact that the program is in ring 3, so it won't act like an OS.

Re:Speculation: Virtualize the x86 by Anonymous Coward · 1999-05-13 12:18 · Score: 0

The problem is that some ring 3 instructions betray the fact that the program is in ring 3, so it won't act like an OS.

Emulate those instructions too.

Re:Speculation: Virtualize the x86 by Anonymous Coward · 1999-05-13 13:55 · Score: 0

Unfortunately, those instructions don't cause traps. For example:
MOV EAX, CS
AND EAX, 3

If EAX contains anything other than 0, you're not in ring 0.

Another possibility is to to attempt to load GDT, LDT or IDT with a different value and then verify that that value is there. This is possible because *reading* those registers also does not a fault (aside from possible destination problems).

Re:Speculation: Virtualize the x86 by Anonymous Coward · 1999-05-13 15:27 · Score: 0

Ideally, one would want the processor to do all the work. If one can't coax the processor to throw an exception on those codes (and I'm not 100% sure there may not be any tricks one could do) then before code is executed, scan it looking for instructions which *might* compromise the emulation and replace them with short code which will cause a GPF or other exception, and emulate the original opcode when you get to the replaced one (which caused your GPF handler to be called.)

Basically, the program scans the code to be executed using a table which contains the size of the current opcode being read (so you know where the next one to scan is at.) When you find an opcode which won't trap but which has the potential to ruin the emulator's illusion (such as your MOV EAX, CS above would), save the address of the instruction and the "context" of the instruction (here, MOV EAX, CX) and replace the offending instruction with an instruction which will trap. Do this check (and if required replace) for each instruction of the code.

When the code is run, the instructions which were replaced will now throw exceptions. When an exception is thrown, check the address against the table of replaced addresses. If it's a match, emulate the instruction which replaced (remember, the "context" was saved.) If it isn't a match, emulate the instruction at the address of the exception.

Anonymous Coder

Virtual Real and Virtual Virtual Memory by Anonymous Coward · 1999-05-13 17:36 · Score: 0

For the guest machines real memory you just map it to your virtual memory and run the guest with traslation on even though the guest thinks it's off. For memory addresses that do not allow virtual address translation, you'd better have some other hardware mechanism to trap access by the guest machine. Usually page zero is protected and you don't really enable the guest for priveledged access. You run the guest unprivelegded and emulated the priveledged instructions if the guest is in the virtual priveledged state.

When the guest is running with its virtual address translation on, you have to scan the guest's virtual memory address tranlation tables to map its virtual addresses to your real addresses of where you have the guests real memory pages.

So if the guest has a page table entry thus
virtual ->real
0x100000 -> 0x20000

and you have the guest real page mapped thus
0x20000 -> 0x50000

you run that when the guest has address tranlation off and change that entry to
0x100000 -> 0x50000
when the guest has address translation on.

That's basically it if I remember right.

Well, duh. That's pretty simple. by Anonymous Coward · 1999-05-13 19:05 · Score: 0

I was in the shower, and it just occured to me how to clean up the virtualization in the x86 without too much slow down. Since when we're in a basic block (no jumps in the middle, no one jumps to the middle of it (but that part's not relevant here)), we know the instruction flow until the end of it, so we can easily scan it. So just have an elaborate automata which scans through, and finds the next unsafe instruction, and set a hardware debug breakpoint to go off right before executing that instruction. As far as memory accesses, that, of course, would still be controlled by the MMU. Now just put in the way-complicated x86 handlers, and walla! VMWare!

Re:Well, duh. That's pretty simple. by Anonymous Coward · 1999-05-14 01:14 · Score: 0

If this post was serious (I can't tell), note that it is harder to identify basic blocks in machine language than it is an C. The start of every instruction is a potential flow in. But you can speculation that a block is basic, and rescind that speculation if a later jump shows that you are wrong.

Your suggestion also runs into the halting problem.
Re:Well, duh. That's pretty simple. by Anonymous Coward · 1999-05-14 02:37 · Score: 0

If this post was serious (I can't tell), note that it is harder to identify basic blocks in machine language than it is an C. The start of every instruction is a potential flow in. But you can speculation that a block is basic, and rescind that speculation if a later jump shows that you are wrong.

Your suggestion also runs into the halting problem.

Well in theory you run into the halting problem. In practice most of the code is written in C so it there are probably many patterns to identify the beginning/start of functions (say push %ebp, mov %esp, %ebp). Maybe does compilers generate dense code (i.e. data structures aren't located between the machine code of two functions), and padded with NOP, in which case you'll can identify function code with close to 100% probability ; you can also consider the all the jump/calls. Indirect calls are more problematic.

An alternative is to fill all the code pages with "int xxx" ("int $03" is one byte), and then to replace progressively with actual code, but you have to be sure that there is no data.

I don't think that vmware uses this approach, because discriminating between data and code seems a bit unsafe.
Re:Well, duh. That's pretty simple. by Anonymous Coward · 1999-05-14 22:26 · Score: 0

Exactly. Except I'm saying we only work with one basic block at a time, the one we're currently in. Trap before a jump out of the basic block. Then when we jump somewhere, figure out what basic block that place is in, etc.

This could use some optimization, like if it jumps back into an already-explored and known clean basic block (e.g. for a copying loop), just forget about monitoring that jump.

Re:Virtual Memory by Anonymous Coward · 1999-05-13 21:06 · Score: 0

Yes, the Hurd can bind user space programs to files or directories (more general: to inodes).

These are called translators, and are crucial to the Hurd. Device files are implemented this way, as well as network capabilities and mounting of file systems. Nothing of this is in the kernel. It is all running in user space.

I suppose this functionality makes the Hurd a dream for kernel hackers. A u ser can run a sub hurd, too, and attach debuggers to crucial servers (like the exec or root fs server) from the parent Hurd.

Marcus.Brinkmann@ruhr-uni-bochum.de

Re: Good question!; incomplete VM in X86? by Anonymous Coward · 1999-05-13 21:24 · Score: 0

But the VM doesn't have to be perfect; it just
has to be good enough to run Windows. And
Linux can be modified to not use any of the tricky instructions.

Re:Funky Walking by Anonymous Coward · 1999-05-13 22:08 · Score: 0

But the problems here are with self modify code, and know what is a dangerous instruction, and what is not. Something may really just be data, or it may by bytes from adjacent instructions (x86 instructions can be from 1 to 15 bytes long, I believe).

From computational complexity theory, we know that there is pretty much no faster way to determine what a program will do other than running it somehow. As such, we need to do this translation at runtime, since an OS running on bare hardware could do a lot of things not considered "good form" in "well-written" user code. I'm convinced that the set breakpoint at end of basic block or sooner dangerous instruction, and then re-scan forward when we get there approach is what's done! See the later post about this ("well duh").

Good, but a little harder on x86 by Anonymous Coward · 1999-05-13 22:15 · Score: 0

That's pretty clever, and sounds about right. The only problem is that x86 has variable-length instructions with no alignment restrictions. So you can only really do the scanning for the current basic block, and _possibly_ explore out from there (until you hit some sort of a register jump).

Re:walla?! by Anonymous Coward · 1999-05-13 22:57 · Score: 0

No... I mean like Wallawalla, Washington!

Re:Here's how it works. by Anonymous Coward · 1999-05-13 22:58 · Score: 0

I thought they just found a buffer overrun exploit by giving him a lot of food.

Re:Remember by Anonymous Coward · 1999-05-14 00:18 · Score: 0

What about EMM3866.EXE under MS-DOS? It has to check that it's not running as a client, since then it won't be able to enter protected mode and swap EMS to XMS.

Re:Ass-talking by Anonymous Coward · 1999-05-14 00:22 · Score: 0

Oh, certainly. I had heard they were adding new registers in KNI. Can someone with a PIII confirm whether SSE gives you more general-purpose registers?

Self modifiying code? by Anonymous Coward · 1999-05-14 01:04 · Score: 0

I know no one is stupid enough to write self modifying code any more, (I occasionally used it to save a few clocks in fast loops on 68000), but it seems to me that self modifying code would make a program incompatible with VMware?

I think that later 68k chips forbade self modifying code, but that the Intel cache handler could deal with it?

Re:Self modifiying code? by Anonymous Coward · 1999-05-14 02:18 · Score: 0

Stupid enough to write efficient code, you mean?

Modern OSs generally mark program text segments as execute-only. Good for security, good for debugging, good for being able to do evil things to code you're loading. Bad for efficiency.
Re:Self modifiying code? by Anonymous Coward · 1999-05-14 03:49 · Score: 0

Stupid enough to write efficient code, you mean?
Modern OSs generally mark program text segments as execute-only. Good for security, good for debugging, good for being able to do evil things to code you're loading. Bad for efficiency.
Why should self-modifying code should be good for efficiency ?
You would kill the pipeline if you modify code too intensively, the decoded instruction cache, ...

Re:Reading hardware registers by Anonymous Coward · 1999-05-14 01:31 · Score: 0

The problem is not so much a direct inspection, but that a "save/modify/restore" will restore the registers incorrectly. Starting in "guest privileged" state:
save state - saves the *real* state (non-privileged) rather than the virtual state (privileged)
modify - for whatever reason
restore state - sets the guest virtual state to the saved value (non-priveleged), which is wrong
perform privileged operation - traps, at which time the privileged instruction emulator sees the wrong virtual state, and incorrectly gives the guest a privilege violation.

Re:Ass-talking by Anonymous Coward · 1999-05-14 02:43 · Score: 0

But how do you know what to breakpoint? There's only 4 hardware breakpoints on x86, I think.

Yes, but you can replace the code with "int $03" (breakpoint) which is designed for that (it takes 1 byte).

Distingushing code from data by Anonymous Coward · 1999-05-14 03:38 · Score: 0

(Expanding on your use of INT 03 to "discover" the code.) You can use to page permission bits (read, write, execute) to figure out whether a page has been used (so far) as all code, all data, or a combination of the two. Initialize the bits to no access, and record what you see from the exceptions which occur.

Re:Speculation: Virtualize the x86 by Anonymous Coward · 1999-05-14 04:26 · Score: 0

At the heart of it was an extra couple of lines in the interupt handler that caught an interrupt either after every instruction was executed *OR* (and this is the bit that's important here) caught an interrupt we'd placed.

What we'd done was implement a feature which given an address woudl copy that instruction out, store it somewhere and then replace it with an interrupt. When that interrupt was trapped the progarm counter would be wound back one and the interrupt replaced by the correct instruction.

I like. That is a nice solution to the problem of a processor which can't interrupt after each instruction.

It wouldn't be very hard for the VM, on loading the progarm, to scan for 'dodgy' instruction.s These could be replaced by an interrupt and pushed onto the end of a linked-list. Should this interrupt be trapped then the program counter would be wound back one and the interrup replaced by the head of the linked list.

That's the basic idea. But since program execution isn't linear one probably shouldn't use a straight linked list. A table containing the address and corresponding 'doggy' instruction would be used. (Which could be done as a linked list/array hybrid.) Sort the table based on address, so a binary search could locate the offending instruction quickly.

A VMware clone would'nt be hard to make. It would, however, be tedious to make. (Of course, some of us *like* that kind of stuff!) :)

Anonymous Coder

Re:Speculation: Virtualize the x86 by Anonymous Coward · 1999-05-14 21:34 · Score: 0

Well, as long as the OS is well behaved and you can tell what is and isn't code that might work.

However, IMHO, it is impossible to fully virtualize a cpu without complete hardware support and intel doesn't have it. For example, what if the first thing the OS does after loading is to do a checksum on itself? Any modified instructions are going to show up. Another possibility is to check the timing some operation. Sure, you can screw around to defeat that, but if the OS is intent on finding out if it is in charge it will be an ongoing battle to keep it fooled.

Re:Speculation: Virtualize the x86 by Anonymous Coward · 1999-05-14 22:20 · Score: 0

You don't let it do a checksum on itself. You let it do a checksum on a copy of itself that wasn't modified.

Here's what QNX used to do by Anonymous Coward · 1999-05-15 18:49 · Score: 0

The QNX OS used to run win31 in real mode by modifying the keyboard interrupt and having it switch between the two OS's.

This was QNX2 and you had to live with stuff like small memory models and no paging but it was fast and small.

Yes, engines work that way, too! by Anonymous Coward · 1999-05-13 10:02 · Score: 1

Funny, my dad told me that's how an 'internal combustion engine' works.

My dad explained "marginal utility" and "network externalities" the same way. In fact, he used the same explaination for sex.

Hmmm . . .

I hope your entirely humorless post . . . by Anonymous Coward · 1999-05-13 10:10 · Score: 1

. . . is a parody of the humorless mental dwarves who moderate all the life out of Slashdot.

Lighten up.

Re:Ass-talking by Anonymous Coward · 1999-05-13 10:28 · Score: 1

You don't need a disassembler, just a pattern matcher. Scan through each text page looking for unvirtualizable binary opcodes and over-write them with some other opcode(s) you know will cause a trap, make sure to pick some trap that is unique enough that you can determine what opcode you replaced.

I believe the main problem is that opcode to ask the cpu what mode (user or supervisor) it is in is not virtualizable, so it isn't like you will need to be recomputing things like offsets into the stack and complicated stuff like that, just a binary search and replace.

Re:Ass-talking by Anonymous Coward · 1999-05-13 10:31 · Score: 1

But how do you know what to breakpoint? There's only 4 hardware breakpoints on x86, I think.

Speculation: Virtualize the x86 by Anonymous Coward · 1999-05-13 11:53 · Score: 1

In a nutshell, this is how I would do it:

My program (VMware work-alike) would run in ring 0. ALL other programs would run in ring 3. I'd set a seperate x86 VM for each instance of a hosted operating system. My work-alike program would emulate any instructions which can't be run in ring 3. The work-alike would have to emulate a lot of the functionality of the x86.

When a program tries to use an instruction that can't be used in ring 3 a GPF will occur which will switch control to my work-alike. My work-alike will look at the stack to get the address of the offending instruction and emulate its functionality, then return control back to the program. The program would have no idea it was just interrupted and its request emulated as long as it gets what it wants.

It'll be much more involved than the above and would require a lot of work to write, but thats basically how it could be done.

Anonymous Coder

Re:Speculation: Virtualize the x86 by redhog · 1999-05-13 16:56 · Score: 2

It may do the work, and does, at least for vmware. There is a thing called single-step in an 80x86, which lets you single step a program, with an interrupt after every instruction, so you may easily look-ahead and see if the next would compromize your emulation... This functionality is used by vmware, while they say this functionality is not available to debuggers running inside a guest-os...

--
--The knowledge that you are an idiot, is what distinguishes you from one.
Re:Speculation: Virtualize the x86 by Muttley: · 1999-05-13 16:08 · Score: 1

As a project at University we had to implement a debugger in Minix such that executing

debug

would run the program using an execdbg (like execle but slightly modified) call and run the program *htrough* the debugger. At the heart of it was an extra couple of lines in the interupt handler that caught an interrupt either after every instruction was executed *OR* (and this is the bit that's important here) caught an interrupt we'd placed.

What we'd done was implement a feature which given an address woudl copy that instruction out, store it somewhere and then replace it with an interrupt. When that interrupt was trapped the progarm counter would be wound back one and the interrupt replaced by the correct instruction. Execution would be resumed on some input from the user.

It wouldn't be very hard for the VM, on loading the progarm, to scan for 'dodgy' instruction.s These could be replaced by an interrupt and pushed onto the end of a linked-list. Should this interrupt be trapped then the program counter would be wound back one and the interrup replaced by the head of the linked list.
Re:Speculation: Virtualize the x86 by Muttley: · 1999-05-15 09:13 · Score: 1

To get the x86 into single stepping mode you OR the psw with a special number (ie set a single bit), this means it will generate an interrupt every single instruction. To turn in off you just AND the psw with an inverted (NOT-ted)version of this special number.

From the Minix src code (only because I've got it to hand, it'd be the same for Linux)

src/kernel/const.h

(line 9) for INTEL chips
#define TRACEBIT 0x100
/* OR this with psw in proc[] for tracing */

(line 109) for M68000
#define TRACEBIT 0x8000

and from my src code for a minix debugger

/* Set trace bit */
child_proc->p_reg.psw |= TRACEBIT;

... and later ...

/* Clear the tracebit */
child_proc->p_reg.psw &= ~TRACEBIT;
Re:Speculation: Virtualize the x86 by Winged · 1999-05-24 21:14 · Score: 1

Considering that code segments are not by default readable on the x86 (if I remember correctly), this might actually be doable.

-Mat

Re:Virtual Memory by Anonymous Coward · 1999-05-13 12:10 · Score: 1

TLB == Translation Lookaside Buffer

Re:Funky Walking by Anonymous Coward · 1999-05-13 21:38 · Score: 1

Actually, that is exactly how VM is doing it, I believe. We've discussed VM ware at length, here in the office -- we're all REALLY impressed -- and we have come to the conclusion (so realise that the following is just speculation) that:

As others have pointed out, the i386 is NOT virtulisable, so you have to play some tricks, unless you want to emulate the processor (hey, it worked for insignia). But, that is too slow to get VMware's level of performance. Even digital's assembly language translation thingie (the had a vapor ware alpha/i386 hybid processor a few years ago that did this -- look in back issues of byte for pointers) is too slow.

VM ware's trick is to scan the object code at load time and translate unvirtulisable instruction sequences to something else -- what I don't know, but I suspect they jump to an emulator for just that sequence. So it's just like Pure Atria's stuff, and even related to the Melting Ice tech availible for rapid development for Eiffel.

hope this helps.

Johan
johan@ccs.neu.edu -- I'll log in when I get the johan uid. gimmie!

Freemware project by Anonymous Coward · 1999-05-13 22:49 · Score: 1

Anyone who has some time to spare and knows a great deal about how to implement this should head over to: http://www.freemware.org/ they could really use your help.

[The Freemware project started directly after VMWare was announced. It's an effort to create an open source (and possibly portable) VMWare-clone.]

How about "virtualizing" on different processors? by Anonymous Coward · 1999-05-14 02:30 · Score: 1

I have a dual CPU system, and that second processor isn't doing all that much. Rather than use a bunch of kluges to virtualize the processor, why not give a whole processor to that OS?

This would require you to "hide" the other processor from the OS as far as SMP support goes (or to use an OS that doesn't do SMP), and to make it use "device drivers" that use inter-processor communication to funnel the actual device I/O through the hosting OS.

There would be other issues, like arranging for the hosted processor to see a BIOS that doesn't try to access devices directly and guaranteeing that the hosted processor doesn't go playing with hardware directly. Unfortunately, the solutions to those problems might leave you back at the original problem.

Re: Good question!; incomplete VM in X86? by Anonymous Coward · 1999-05-13 10:16 · Score: 2

It's been >5 years since I last studied this, but I thought that one could set a flag that allows even debug registers to be "protected", in effect causing a fault upon *any* access attempt by ring 3 ("user") code.

My current favorite example comes from Linux: The kernel allows user processes to read the current value of the CPU clock counter (using the instruction "rdtsc", or "read time stamp counter"). That instruction can be made to cause a fault by an appropriate flag setting.

I would expect Intel to be fairly good at VM technology after hearing some of the complaints about the '386. (The obvious one is the lack of ring 0 write-protect page faults.)

Funky Walking by Anonymous Coward · 1999-05-13 10:59 · Score: 2

It might be useful to do some research on Rational Purify (formerly of Pure-Atria, formerly Pure Software). Purify lets the programmer/debugger trap on reads of un-initialized memory and provides fast watch-points (100s of times faster than gdb's watch-points). So clearly Purify is doing exactly what you want (while it isn't really clear if VMware is).

I'd dig around in deja-news if I were you.

Re:Funky Walking by imp · 1999-05-13 15:13 · Score: 1

Purify isn't doing what VMware is doing. Purify rewrites the object code at compile time (and sometimes at runtime if you load a non purify'd file in). That is why it can do this so fast. It doesn't play any hardware games to get this speed (like some debuggers that unmap pages that for watchpoints, or try to use the limited debugging registers on some CPUs).

They have patented the object code rewriting, but as far as I know, no one has challanged the patent. Evidently, there were published papers years before Purify reinvented (and patented) this scheme, so at least some of the Purify patents may be uninforcible because of that.

Anyway, this has very little to do with VMWare, or how VMWare is implemented.

Re:Ass-talking by Anonymous Coward · 1999-05-13 12:31 · Score: 2

No, it's simpler than that. Read the linux-kernel archives and see how the UltraSparc guys discussed working around the bugs in the UltraSparc CPUs:

(1) you mark all the pages that you want to trap instructions in as non-executable

(2) when code attempts to execute in one of those pages, you get a fault

(3) you trap the fault, and then (and only then) scan the page and modify instructions as necessary

(4) you then mark the page executable and not writable, and let it run

(5) if the page is modified, you then clear the executable bit, because you may have to re-scan it.

Here's how it works. by Anonymous Coward · 1999-05-13 09:49 · Score: 4

Okay, imagine that the memory of your computer is like a vast attic, full of flies. Each of the flies is either asleep or awake, and they change state frequently. They live, work, and play in groups of eight, called "bytes". Now, when the computer gets hungry, it opens up its mouth much like a blue whale and sucks in a great big gulp of air from the attic. It filters the flies out of the air with its giant long strandy teeth and gobbles up the flies -- gobble gulp!

So.

The whale has no eyes, and in the whale's tummy there is a man without his greatcoat. That guy is called the "kernel", or "Colonel", and he looks and talks exactly like Colonel Klink on Hogan's Heroes. He has a goofy, bumbling sidekick named Sergeant Shultz, otherwise known as the "Memory Mangement Unit". What Sgt. Shultz does is, well . . . okay. Let's start over. Colonel Klink is in charge of sorting through these flies and putting them together in the right order before the whale (the computer, remember) digests them. This way, the whale won't get a tummyache and feel funny. Col. Klink has to decide which flies to send when, but he needs to have them organized in the right way so he knows which flies are which. If two batches of flies crash into each other, the computer will get very frowny and sad. Col. Klink doesn't like that, because when that happens the General comes and yells at him in German, and Col. Klink doesn't speak German, he just speaks English with a funny accent. So Sgt. Shultz has the very important job of ensuring that the flies don't get mixed up before Col. Klink gets to look at them.

In the arrangement that you're talking about above, things are more complex, because Col. Klink and Sgt. Shultz have to coexist with Col. Hogan and Richard Dawson, who are doing the same thing at the same time. (A little imagination will suffice to guess which OS is which). Hilarity ensues! But everything runs smoothly again at the end of the episode.

Hope this helps.

Re:Here's how it works. by nrrd · 1999-05-13 11:25 · Score: 4

One of the best descriptions of file paging is here. It give a fairly solid, humorous description. Long live the Thing King!

--
"Eye halve a spelling chequer, It came with my pea sea, It plainly marques four my revue, Miss steaks eye kin knot sea"
Re:Here's how it works. by tim+the+geek · 1999-05-13 23:03 · Score: 1

you forgot the part about the booze and naked women...

Ass-talking by Anonymous Coward · 1999-05-13 10:09 · Score: 4

Since everyone else is talking out of their ass on this one, I might as well too.

The general consensus in comp.arch is that vmware is doing some dynamic recompilation, but is otherwise allowing the hosted operating system to execute natively, and thus use the hardware mmu for the majority of the work.

As has already been mentioned, the IA32 instruction set architecture (ISA) is not completely self-virtualizable, i.e. you can't trap accesses to all cpu state information. But, you can scan through the text of your process and search for those specific opcodes that are not virtualizable. Substitute a call to your own handler for those opcodes and voila! we are now effectively fully virtualizable and the performance hit is minimal, especially if you can save your changes so that you don't have to scan and recompile each page of text more than once. And once you are fully virtualized, as long as you properly trap the right operations and do the right thing, you can let the hardware do 99% of the work for you.

Clearly vmware does more than this with its various virtualized devices, but fundamentally this is probably what is going on.

Re:Ass-talking by dwmw2 · 1999-05-13 18:24 · Score: 1

The idea is sound, it's only that Intel got stingy with the breakpoint registers.

s/breakpoint//
Re:Ass-talking by synaptik · 1999-05-13 10:34 · Score: 1

I don't think it's quite that simple. You can't scan and replace because

(1) what you replace may be data, not code

(2) even (1) wasn't a problem, how do you know the sequence you found isn't a coincidence? What if the end of "REPNE SCASB" and the beginning of "DIV ECX" just happens to look exacly like "MOV EAX, DR7" ?

--synaptik

--
HSJ$$*&#^!#+++ATH0
NO CARRIER
Re:Ass-talking by synaptik · 1999-05-13 10:36 · Score: 1

I was hoping you wouldn't notice that small problem. ;)

The idea is sound, it's only that Intel got stingy with the breakpoint registers.

--synaptik

--
HSJ$$*&#^!#+++ATH0
NO CARRIER
Re:Ass-talking by synaptik · 1999-05-13 10:58 · Score: 1

1. Not necessarily. What about instructions like
"MOV EAX, 0CDCDCDCDh" ?

The last four bytes look like 4 "INT 3" instructions.

--synaptik

--
HSJ$$*&#^!#+++ATH0
NO CARRIER
Re:Ass-talking by synaptik · 1999-05-13 10:28 · Score: 2

This is where the processor comes to the rescue. To be sure, they can't expect every occurence of a certain sequence to be a particular instruction. It's quite possible to have that sequence be a combination of the end of one instruction and the beginning of another.

I'm no OS developer, but if I were trying to do this, I'd try scanning for these strings, and then placing a hardware execution breakpoint at the beginning of them. If it's not actually code, the breakpoint won't get hit. If it is code, then when it does get hit the VMWare software could just look at the instruction pointer register, to ascertain whether they "hit" in the middle of, or the beginning of, an instruction. If the latter, they simulate that "offending" instruction.

But like he/you said, I'm talking out of my arse.

:)

--synaptik

--
HSJ$$*&#^!#+++ATH0
NO CARRIER
Re:Ass-talking by zonker · 1999-05-13 20:22 · Score: 0

jeeze, for that matter, how do you know that you aren't going to replace "OI812" with "ICUP"? Wow!

--
Large print giveth, and the small print taketh away

Re: Good question!; incomplete VM in X86? by synaptik · 1999-05-13 12:32 · Score: 1

My experience in MS Windows land, is that writes to DR7 are protected, but reads are not.

In fact, the Intel documents I have specifically state that this register is readable at any priveledge level, but no where have I seen a statement that you can MAKE it a priveledged instruction.

--synaptik

--
HSJ$$*&#^!#+++ATH0
NO CARRIER

Good question! by synaptik · 1999-05-13 09:48 · Score: 2

Wow, I'm glad Rob Clark thought to ask this on Ask Slashdot. I was wondering this myself.

Although, I would like to add a rider to his question:

With Intel processors, some hardware registers can't be trapped. For example, any priviledge level can read DR7 to find out if a debugger is resident. Writes to this can obviously be trapped, but AFAIK there is no way to get the processor to trap on reads.

I am sure there are other examples like this, as well. This seems to indicated that it is impossible to virtualize every aspect of the machine.

(Although, I suppose you could put the processor into single-step mode, and look at each instruction before it executes, looking for these types of instructions, but that would slow things WAAAYYYY down.

--synaptik

--
HSJ$$*&#^!#+++ATH0
NO CARRIER

walla?! by gavinhall · 1999-05-13 22:11 · Score: 1

Posted by Nick Carraway:

I think you mean "voila." I liked the Colonel Klink explanation better...

Re:Ass-talking: another possibility by downwa · 1999-05-14 01:57 · Score: 1

Rather than rewriting instructions you detect on a scan of a page of program text, you could possibly set a hardware breakpoint for the instruction. Since there are only 4 hardware breakpoints (at least on the 386), you would only be able to do this on pages containing less than 4 instructions that need to be watched.

I wrote a simple program to scan all the windows dlls and exes for "dangerous" instructions. I found that for most exes and dlls, there were less than 4 instructions per page that would be dangerous. For the remaining ones, you could rewrite the instructions. But then, you have to make the page execute only (not readable or writeable-- is this possible?) and trap any access to it by the processor, to fool it into seeing the original instruction instead of the rewritten one.

Or, you could simply do single step on that page (which might be a viable option since there would be so few of those pages in the average OS-- unless someone specifically wanted to make your VM perform badly ;-) ).

--
Life's a lot like money-- you spend it, then it's gone. Spend wisely.

Remember by sjames · 1999-05-13 19:32 · Score: 1

A few comments show that a program may determine that it's in ring 3 rathar than 0. It's important to remember that an OS has little legitiomate reason to check for that. I wouldn't be surprised if M$ added such checks now that vmware is out, but apparently, they haven't done that in the past.

In general, it's not necessary to perfectly virtualize ring 0 instructions, it just has to be 'good enough'. In practice, determining what 'good enough' actually is can be a tough problem (which is why there aren't dozens of vmware like products out there), but perfection is not required. Most OSes are not hostile to being virtualized, they just assume that they're not being virtualized.

Re:Remember by sjames · 1999-05-15 07:02 · Score: 1

EMM386.EXE is a kludge added to a kludge. It was done because DOS programs expected to run in real mode, and the '286 didn't virtual86 mode.

Anything that needs to trigger a processor reset to operate is a problem. EMM386 under 'DOS7' probably doesn't work that way anymore.

Brown Simulator plug by kma · 1999-05-14 01:23 · Score: 1

The short answer is: "mmap + SIGILL + SIGSEGV". If you're curious about the details, you might want to check out the Brown Simulator, which provides a full MMU at user-level on top of Solaris.

No it doesn't by thomasd · 1999-05-13 18:30 · Score: 1

Well, not a complete virtual machine in the sense of the JVM. This approach of completely emulating the target processor has been tried -- the best example is Bochs, which actually works pretty well. But it's seriously slow.

The point of vmware is to provide the fastest possibly emulation of an ia32 machine. So it want to execute all (or nearly all) the instructions directly on the host processor, rather than having to emulate them. The clever bit is to allow it to do this without clobbering the host OS -- this is what requires lots of memory management tricks.

Re:I think... by clawson · 1999-05-14 00:04 · Score: 1

that VM ware uses a virtual machine for each operating system. This is similar to JVM. It allows each OS to run natively on the same computer using it's resources without conflicting with the others. What *I* want to know is disk partitions/file systems. Do you need a different partition for each OS? Or a different drive entirely. Yes, you need a different partition for each OS. But it's that way anyways (for the multi-boot people) without VMWare. VMWare, however, can be set to use an existing partiton with an OS in "RAW" mode, so you don't have to reinstall the OS in a VMWare space on the host OS. Of course, you DO NOT want the host OS to be able to access that partition when VMWare is running...

Re:Here is my attempt at an explanation by dw · 1999-05-13 14:18 · Score: 1

Quite elegantly put :)

While we're at it, here's my guess, which is based on badly blurred memories of 80486 documentation.

Memory reads and writes really aren't the difficult part. In protected mode, every process (or task) gets executed in its own 4GB (max) virtual memory space and gets translated by the processor into absolute memory space. The OS swaps out these task spaces to disk while they're not being used. One process should never be able to write to another processes space, which was the whole point of protetcted mode with the i386.

The real issues involve handling interrupts, and executing protected instructions. Take for instance writing directly to hardware through IO ports. The host OS absolutely can't let the hosted OS do what it wants in this area. But the interupt mechanisms of x86 architecture come to the rescue here.

Run the hosted OS in some unpriveledged level (not ring 0) and let the processor interrupt whenever there's a priveledged instruction executed. The host then examines the situation and recovers by implementing the priveledged instruction in an alternative way.

Registers also won't be a problem in most cases since they are saved and restored at a task state switch. Linux shouldn't care what NT does with the registers as long as they get restored when NT gets preempted.

- dw

Re:The dark ages of Software MMUs by Doctor+Memory · 1999-05-13 19:26 · Score: 1

Wow, that reminds me of the old Altos 3068 I used on my first job out of college. It had a discrete logic MMU on a separate (big!) board. I never really thought about it, but I wonder if there was some problem with the 68[48]51 chips (the system was a 16MHz 68020).

--
Just junk food for thought...

Re:Virtual Memory by Rob_D_Clark · 1999-05-13 10:10 · Score: 2

What I am really trying to figure out is how to trap writes/reads to certain addresses without having to interpret the machine instructions... the best hack I have thought of is to trap SIGSEGV, and have your signal handler try to figure out what was going on.

For example you can mark pages of memory as a not being readable (PROT_NONE flag for the mmap). This will cause a SIGSEGV if the program tries to read/write that address.

Another idea I just thought of as I was writing this post... you could use a kernel module to create a /proc file, then have your VM mmap that file to use as it's memory. Then the module could deal with simulating memory space. (This is assuming you can prevent the mmap file from being cached.) This would be even better if you could bind a user-space program to a file. (I vaguely remember reading that GNU hurd has the ability to do this.)

--
--Rob

I think... by Graymalkin · 1999-05-13 11:38 · Score: 0

that VM ware uses a virtual machine for each operating system. This is similar to JVM. It allows each OS to run natively on the same computer using it's resources without conflicting with the others. What *I* want to know is disk partitions/file systems. Do you need a different partition for each OS? Or a different drive entirely.

--
I'm a loner Dottie, a Rebel.

The dark ages of Software MMUs by mato · 1999-05-13 11:55 · Score: 1

Well, I remember using an old Siemens box. It ran SINIX 1.0, had a 80186 processor and just under 1MB of ram. It also (wonder of wonders) had a MMU implemented as a piggy-back board on the bus with an 8086 and a software ROM. There's software MMUs for you!

Re:soft MMU? by Haight6716 · 1999-05-13 14:18 · Score: 1

OK, well since it's the day of whacky metephors, I'll try and tackle this one...

A normal linux application has to obey posix rules to interface with the outside world (memory, disk, printer, etc..). Lets take memory as the normal example. An *application* has to ask nicely for whatever resources it wants. It dosn't know much of anything about the *real* state of the machine. Like a cow in it's pen, the process has no idea what the other cows are up to or even how many other cows there are or how big the ranch is. Operating systems attempt to "dial directly" to the hadware, and this is the part that VMWare must emulate. No easy job either, considering all the whacky things you can do on a PC. So if I'm an OS (or one of those old boot-me game disks like flight sim 2.0), I don't even worry about allocating memory - I just start eating it by the bucketfull. BIOS loads the kernel into memory starting at 0x00, then "jmp 0x00" (goto 0x00). Kernel executes, checks how much memory it has, and starts parcelling it out to other applications - it rules the ranch. It dosn't *ask* for more memory, it just "walks the fence" to figure out how much there is.

Clearly to make the ranch-owner behave as a simple cow, while letting him run his own little rat-ranch and never letting him have a clue that he's just a cow in a pen is a pretty neat trick.

Most of the "neat trick" is done for you by the CPU, however as some of the more advanced hackers have pointed out, there are a few weak spots in this virtualized environment. So you still have some crazy stuff to do before you can fully fool the rancher. He's always asking if there's a larger wourld out there, and you have to keep him in the dark at all costs (or he'll die of surprise and fright). Like flatland...

Basically, this is done by brain-washing the rancher into never poking his head through that hole in the wall - which we know leads to the *real* outside world. Or what we know as the outside world - but is it really? Maybee we've already been brainwashed ourselves!!

Application = cow
OS = rancher
computer = ranch
vmware = sophisticated brainwashing for ranchers which makes them think rats are cows and keeps them from looking over the wall of the stall. Also makes them live on hay instead of beef.

-=Julian=-

Re:Virtual Memory by RedGuard · 1999-05-29 06:51 · Score: 1

Unfortunately unlike MIPS, the x86 TLB is
implemented entirely in hardware so this won't
work.

Watch MS by DGolden · 1999-05-14 07:28 · Score: 1

Maybe you could get some hints as to
how VMWare works by watching what MS
changes in their next OS release in
order to break it, if they can... :-)

--
Choice of masters is not freedom.

Re:Virtual Memory by MasterD · 1999-05-13 17:06 · Score: 0

Doh. I knew that...mistyped it.

Virtual Memory by MasterD · 1999-05-13 09:29 · Score: 5

It all has to do with virtual memory. (not the misnomer use as swap) Basically, there is a mapping between _real_ memory addresses and the addresses programs use to access data.

In a kernel, this is done (usually) using a mix of hardware and software. If a program tries to access a piece of memory, the hardware looks at the Transition Lookaside Buffer (TLB) to translate the address. If the address exists in the buffer, it does the transition and all is good. If it does not exist, a trap is called to the kernel. It is the kernel's responsibility to look at the virtual memory tables, allocate the memory, copy it if it was copy on write, and most importantly update the TLB so next time it does not have to set up the translation.

So in VM case, this is sorta conjecture. The VM can allocate a slew of memory on the host OS. (As far as the client OS is concerned, this is physical RAM. Then it can make a TLB and all memory accesses will go through it first. This way it can stop Windows from pissing all over OS/2 running on the VM. But Linux will stop the VM from pissing on anything else on the host OS.

As far as kernel traps, the user level program's data needs to be copied over to kernel space for the kernel to access it.

I hope this begins to answer your question.

Re:Virtual Memory by noom · 1999-05-13 10:18 · Score: 1

If you want to map a memory region to a file, check out the manpage for mmap(2).

Software MMU's by KrAphtd1nN3r · 1999-05-13 21:43 · Score: 0

OK, since everybody seems lost in a tangle of ideas, here's how it really works :

When you start VMWare, it reboots your machine without you even noticing anything and it loads Windows as the OS. It's as simple as that. The only thing I haven't figured out yet is how they keep a shot of X on the screen while the computer is rebooting... I'll figure that out by tomorrow!!!

--
"Code free or die!"

soft MMU? by kheldar · 1999-05-13 12:03 · Score: 1

I never realy thought it would be all that complicated.

wouldn't linux be able o give it a fixed ampount of RAM as an application and then let VMWare tell linux what to do with it? (er.. i guess im asking why is it different from any normal application? dont they all have protected memory?)

arghh.. my head hurts now..

--
--- all posts are not affiliated with my workplace. period. i dont care how good it may make them look, they are all

Re:soft MMU? by kheldar · 1999-05-13 19:25 · Score: 1

that makes sense. perhaps im just a naieve beginer at programing... (i am)

but dont all programs get told this by the OS? I thought protected memory was supposed to segment stuff from each other, to keep them all from knowing who all is out there. And what i have read from the documentation of VMware, it doesnt emulate the CPU (which would be slow) but lets it talk to the CPU (er??? doesn't it???)

hehehe... im having flashbacks to the matrix... maybe life was designed by VMware. With liuck my brain is running under linux, not nt.. =)

on a side note, what happens if you let VMware boot your linux drive under linux? or what happens if you have linux automaticaly boot NT under VMware and then have NT boot linu, and so on.. recursive OS! =)

--
--- all posts are not affiliated with my workplace. period. i dont care how good it may make them look, they are all

technical references by rgrimm · 1999-05-13 12:44 · Score: 5

The basic technology behin VMware (I suppose -- all three authors are cofounders of VMware) is described in research papers at http://www-flash.stanford.edu:80/Disco/ , if you want to find out the details.

Robert

Slashdot Mirror

Ask Slashdot: How do Software MMU's Work?

92 comments