Virtual Machine Design and Implementation in C/C++
Virtual machines are, in effect, a software model of a whole system architecture and processor. They take in bytecode (formed of opcodes, operands, and other data) and execute it, much in the same way a real system executes code. Running these operations in software, however, gives you more security, and total control over how the system works.
Virtual machines are popular for a number of reasons. The first is that they give programmers a third compiler option. You don't have to either go the dynamic interpreted route or the static compiled route, you can compile for a virtual machine instead. Another is that virtual machines aid portability. If you compile your code for a virtual machine, you can run that binary on any system to which the virtual machine has been ported.
Few books have been written on virtual machines, with only a few Java Virtual Machine titles available. Virtual Machine Design and Implementation by Bill Blunden is therefore a landmark book for anyone with an interest in virtual machines, or even system and processor architecture as a whole.
What's to Like?Blunden makes sure to cover every topic related to virtual machines in extreme depth. The beauty of this is that you're not left in the dark, but that experts can simply skip sections. The book is well divided up, and off topic rants or notes are clearly marked with dividers. This is an easy book to read, even though it runs to some 650 pages.
To lead the reader through the entire production of a virtual machine, Blunden showcases the development of his own 'HEC' virtual machine (HEC being one of the fictional companies in 'CPU Wars'). Initially he starts slowly, and introduces the reader to how CPUs work, how memory works, how paging works, and how almost any other system process you can imagine works. Nothing is missed out. Multitasking, threads, processes, porting.. he covers it all. This is excellent for those new to some of these topics, and makes this an advanced book that's actually quite readable by someone with a modicum of computer science experience.
After laying down the foundations for the design of the virtual machine, the actual development starts in Chapter 3. All of the code in this book is in C or C++, and nearly all of the code is talks about is actually printed on the right pages in the book. No more flipping between code on your computer and the book, it's all just where it should be!
Further on in the book, a number of extremely advanced concepts are introduced, but even these need not be out of the reach of an intermediate programmer. Blunden presents the most vivid insight into how assemblers and debuggers are created, and the book is worth it for this information alone.
Another important thing about this book is that it looks at creating a register based virtual machine. Stack based virtual machines are covered, but the author makes a compelling argument for using registers. This makes a refreshing change from the Java Virtual Machine books that ram stack based theory down your throat. It's also useful if you're interested in the Perl 6 'Parrot' project, which is also an in-development register based virtual machine, and bound to become rather important over the next few years.
What's to Consider?Virtual machines aren't for everyone. If you're a high level programmer working with database apps, this isn't really for you. This book is primarily for system engineers, low level programmers, and hobbyists with an interest in compilation, assembler, and virtual machine theory.
This is not a book for beginners. You need to have a reasonable knowledge of C to understand the plentiful examples and source code in the book. C++ is also useful, although OOP is clearly explained, so even a standard C programmer could follow it. That said, this is an excellent book for intermediate programmers or computer science students, as a number of advanced topics (garbage collection, memory management, assembler construction, paging, token parsing) are dealt with in a very easy to understand way.
The SummaryReleased in March 2002, this book is extremely up to date. This is good news, as virtual machines are clearly going to take up a good part of future compiler and operating system technology, and this makes it important to learn about their construction and operation now. These technologies are already in the marketplace; Microsoft's .NET, and JVM, for example. Perl 6's 'Parrot' is also going to become a big player, with languages like Ruby, Python, and Scheme being able to run on it in the future.
Whether you want to learn about system architecture, assembler construction, or just have a reasonably fun programming-related read, this book is great.
Table of Contents- History and Goals
- Basic Execution Environment
- Virtual Machine Implementation
- The HEC Debugger
- Assembler Implementation
- Virtual Machine Interrupts
- HEC Assembly Language
- Advanced Topics
You can purchase Virtual Machine Design and Implementation in C/C++ from bn.com. Slashdot welcomes readers' book reviews -- to submit yours, read the book review guidelines, then visit the submission page.
Can anyone give me a substantial difference between a virtual machine, and an emulator...
because I can't see whats different between my mame and java virutal machine...
Cruise TT
I must say I'm pleased to hear about this book. I actually would like to do something with VMs in my upcoming academic life (read: grad school), but am having trouble getting started, nor am sure if this is what i want to study. Every search engine out there returns everything Java for the phrase "virtual machine," which is not exactly what I'm looking for.
The One Rule Of Chess You'll Ever Need: Don't play someone who carries a kit in their bookbag.
Inside my virtual machine, where then I can run some sort of virtual reality program where I can interface with Eliza.
Some alternate titles for this tome might be:
1. Reversi: C64 Speed on a Pentium IV
2. Double Your Code, Halve Your Speed
3. Real Men Don't Use Real Computers
4. VM:Very Macho or Verily laMe
5. Atari ST Rebirth: a 20 Year Reversal
etc., etc.
Ack, I'm turning into a crank! Oy.
Everything in the Universe sucks: It's the law!
One of the things that has surprised me about virtual machines ever since Java became a buzzword was that no one had ever thought to eliminate the relative performance penalty by implementing the VM as hardware on a PCI card (or a licensed chipset to put on the mobo. I can understand the portability implications of using VM's, and I'm glad that much work is being developed in this area.
My question to anyone qualified to comment: Is there a reason why these virtual machines aren't taken as a blueprint for real hardware and implemented as such? I can imagine real performance benefits happening with such an idea...
Dear sir, That was one of the most informative posts I have seen on Slashdot in a long time, perhaps since the days of TheGloriousMeept! It truly captures the spirit of discussion. Thank you again kind sir, me
Define operating system. Don't mix it up with boot loaders or kernels. Now explain how Microsoft's .NET isn't an operating system.
The Implementation of the Icon Programming Language
[cover]
This book describes the implementation of Icon in detail. Highlights include:
* Icon's virtual machine
* the interpreter for the virtual machine
* generators and goal-directed evaluation
* data representation
* string manipulation
* structures
* memory management
http://www.cs.arizona.edu/icon/ibsale.htm
Information on the Icon programming language itself can be found at
http://www.cs.arizona.edu/icon
All of the code in this book is in C or C++, and nearly all of the code is talks about is actually printed on the right pages in the book. No more flipping between code on your computer and the book, it's all just where it should be!
Practically all coding books do this, and I mostly find it a cheap way to poop out thick books and massive volumes... Not a measure of quality in any way.
When will I end this grieving ? When will my future begin ?
I program in Java mostly right now, and so when people begin the usual 'vm is slow' crank I am curious about what they exactly mean.
Programs written to run on vm's can be significantly slower due to the extra layer. Yet, if the design of the vm is done well enough (by perhaps reading this tome?) then the vm should be comparable. Certainly C is faster generally than an interpreted language. But there are native compilers out there than provide very comparable results, and the advantage of a language that forces careful programing. Here is the slashdot link
If adding layers to programs automatically makes them slower, and so slow that they are useless, we all would code in assembly.
Good design is important. A badly written C program of which there are thousands, will be just as slow (read bad) as a badly written vm program.
"The large print giveth, and the small print taketh away" -Tom Waits
Also, don't forget the UCSD P-System, which used a virtual machine to run code compiled in that environment. I know of at least one commercial product that used the P-System; I believe there were many.
Virtual machines have been around awhile; they're an interesting field, made newly relevant by the ascendancy of environments such as Java and the MS CLR. I just wish I had a good excuse to drop $50 on this book...:-)
Eric
Be who you are...and be it in style!
It would also be nice to have language-level support for parallel processing, like in Occam.
For example, in a Python implementation, the following code would execute the two for-statements in the "par"-block in parallel:
As the two threads would be executed exactly at the same speed, the output would be:
Plus, digital circuits are a little less complicated and better understood than nuclear explosions and particle interactions.
Emulators use virtual machines, operating systems use virtual machines (Microsoft's .NET), and programming languages use virtual machines (Perl, Java)".
Microsoft's .NET is an example of a virtual machine used by a particular operating system - there are no claims that .NET is an operating system by itself. Similarly, the Perl and Java programing languages have been implemented on virtual machines - the JVM, and the stack-based (soon to be register-based) Perl virtual machine.
Perl is a hybrid. It's interpreted at run-time like a scripting language, but before execution, it's pre-compiled to a VM-like bytecode. The idea is to get the best of all worlds. Things like mod_perl benefit from this a lot by only having to compile once but running many times.
God Fucking Damnit
People had said for a long time that personal computers connected to file servers was a lower-cost, better system. However, now many places are going to web-based or host-based connections because of buggy issues at the desktop and the unmanageability of the personal computer. Couple this with the fact that licensing manangement is such a bear and you see why us Unix folks are glad to see the turn-around.
Mainframes had been on their way out before the personal computer, in favor of smaller satellite processing via minicomputers. However, now people are realizing that virtual computers in a big iron case gives you a better managed array of computing power for multiple users or processes. I for one welcome this back, and hope that we will continue to see vitual computing take over the personal computer business market approach. Bring in the network computers!
Click here or here.
Yea, so for a class project we took the kvm and (Java VM for embeded devices), and turned it into a pipelined architecture. It was very educational, but the practicality is lacking ... You at least need a 4 proc machine to be useful, as it was a simple 4-stage. But the speed was soo lacking.
;) .... as it was only a learning experince no big deal. By the end i could've written my java in assembly ;)
It was worthwile experience, though I do wish java was reg based.
/* Lobster Stick To Magnet!*/
For more information on Parrot, which will be the Perl 6 virtual machine, and which is register-based, you may want to check out http://www.parrotcode.org/.
It seems to me that many of you are viewing VM as some kind of emulation application rather than a virtual machine. What you may not realize is that many(most?)OS kernels including Linux virtualize the hardware to make the software more portable and less able to crash your entire system. What you lose in performance you make up for in stability. Operating systems books are a great reference for studying VMs.
Operating System Concepts by Abraham Silberschatz, et al.
Design and Implementation of the 4.4bsd Operating System by McKusic, et al.
Design of the UNIX Operating System by Bach
Modern Operating Systems by Tanenbaum
Operating Systems Design and Implementation by Tanenbaum
but does it cover infocoms famous zmachine VM, which runs on more hardware than any other virtual machine ever... (considering it can run under java as well.. a vm runnnig a vm!)..
or magnetic scrolls 68k VM, that that even ran on the c64 with its mighty 8bit chip, was emulating the 16/32bit 68K!
aaah long live interactive fiction and virtual machines.
no sig for you
Even the earliest VMs did garbage collection (take a look at Lisp which for some reason nobody has mentioned here yet).
However it is true that this argument could be made for any feature added to the VM, but it does seem that using the VM design to get away from numerically-addressed memory is a natural division that most designers go for.
Probably the main thing that makes people not consider this idea is that it would be a new OS that does not run any existing programs. Although there are plenty of alternative OS's out there, most people see VM as a way to get their new interface onto an existing system so they never consider this way of writing it.
It doesn't run without Windows?
I guess you could say it blurs the line/is highly integrated with the underlying OS, but wasn't there just a protracted legal battle based on that?
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
Can anyone explain what's similar/different between doing this and doing something like ReBirth?
Libertarianism is rich wolves and poor sheep playing gambler's ruin for dinner.
one thing to remember is that the microsoft .net infrastructure does not run on a virtual machine!
.net support libraries. there is no interpreted code execution going on, and indeed, the IR is not optimized for interpreted execution. hence, there's no virtual machine running, unlike in the case of Java or other bytecode interpreters.
.net code (c#, etc.) compiles down to a standard intermediate language, which gets JITted into machine code, and linked to
.net is not a virtual machine any more than gcc is a virtual machine.
My other car is a cons.
Was in Unreal. That was, what, five years ago? It was a revalation to me as a commercial games developer. You could script object behaviour in C-like code, and load it dynamically at run time without having to restart the engine or try and do clever tricks with dll's. The development time that saved was simply breathtaking, and it pretty much defined the future of games engines and games development, which epitomise the RAD concept. Heck, the first thing that we did was crank out our own C-like VM, and we never looked back.
If you were blocking sigs, you wouldn't have to read this.
There are many virtual machines are aren't bloated. How large is the JVM? Just the JVM, not the library (jars, class files). I imagine it's pretty large. Just because Java is poorly done it doesn't mean all languages with VMs are as bloated. Smalltalk, which is by all measures a language with a huge library, can have a VM as small as 100k, but still get to all the standard libraries. Sure, you could target it to hardware, but if the language is well designed (like Smalltalk, not like Java) it's not as much of an issue.
Not even the JVM includes anything near the kitchen sink. The libraries do. They're not terribly hard to port when all they do is interpret bytecodes.
It's sad to see people with these kind of attitudes. In their minds, all virtual machine-based languages equals Java. Anything that's not compiled directly to native code equals QBasic. That's not the case.
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Partially true. For example, Forth's VM avoids all that, and is VERY fast -- as a bonus, its VM is extremely easy to produce native code for (native code compilers are entirely compatible with others).
Others have discussed why GC isn't as bad as you say; I agree with them, although they're a little extreme (it's NOT true that you always need GC).
I'm working on a VM which can handle both GC'ed and non-GC'ed stuff at the same time, for a substantial speed advantage. Unfortunately, my VM has a language tiedown; I'm not sure how to add the type support I need to most languages.
-Billy
But has anybody given any thought to making a VM that runs almost on top of the hardware with almost no system calls?
It's been done many times since the 70s. Not sure the first time a VM-based language was the OS, but it was the case with Smalltalk, as far back as 1972 or 1976. You can still get a Smalltalk-based OS with SqueakNOS. Squeak traditionally runs on top of a host OS like Linux, Mac OS, Windows and many others, but it has almost all of the features of an OS, including an awesome (but non-traditional) GUI system, compleat with remote viewing. The binaries are identical between the OS-version of Squeak an the hosted-on-Linux version.
The current state of SqueakNOS is that you still have to write a little C for certain things. Luckily, you can write your low-level code in a subset of Smalltalk and have it translated to C. That's how the Squeak virtual machine is written, no manual C coding required. However, there is active work being done on Squeampiler, which allows Squeak itself to compile and generate native code. Which means the entire system 100% will be in Smalltalk.
As it is now, if you want to change (in SqueakNOS or Squeak on top of a 'normal' OS) fundamental changes to the language can be made within the environment. The only thing compiled to C is the virtual machine and other C plugins, like OS-specific functions. Everything else, the bytecode compiler, the parser, an emulator for itself, all the development tools and libraries are all written in Smalltalk.
I am working on an operating environment for PDAs, Dynapad along these lines. I'm doing the development on top of Linux/PPC, Solaris/SPARC, and Windows/x86 and run it on my iPAQ under WinCE/ARM. Eventually, I'd like to run it as the OS, if something like OSKit ever makes it's way to the iPAQ platform.
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Parrot is appearantly the result of an April fools joke that wouldn't die. As a result Parrot is supposed to compile not only Perl 6 but also Python and possibly Ruby.
Whee!
So a C/C++ library that interfaces well to Parrot would be accessible by LOTS of different scripting languages.
(I haven't been following the developer lists, so this is just based on what I overheard as a casual outsider with a bit of interest.)
I think we've pushed this "anyone can grow up to be president" thing too far.
Implementing a stack-based machine in hardware is straightforward, and has been done many times. The first one was the English Electric Leo Marconi KDF9, in 1958. Burroughs followed, and thirty years of Burroughs stack machines were sold. Java has a small implementation of the Java engine in hardware. Forth chips have been manufactured.
But all these machines have used sequential execution - one instruction at a time. Nobody has yet built a stack machine with out-of-order execution. There's been a little research in this area. Sun's picoJava II machine has some paralellism in operand fetches and stores. But nobody has wanted to commit the huge resources needed to design a new type of superscalar processor. The design team for the Pentium Pro, Intel's first superscalar, was over 1000 people. And that architecture (which is in the Pentium II and III) didn't become profitable until the generation after the one in which it was first used.
In the end, a superscalar stack machine probably could be designed and built with performance comparable to high-end register machines. For superscalar machines, the programmer-visible instruction set doesn't matter that much, which is why the RISC vs. CISC performance debate is over. But so far, there's no economic reason to do this. Sun perhaps hoped that Java would take off to the point that such machines would make commercial sense. But it didn't happen.
I never did get the p-Code interpreter card for my TI-99/4A, though; an environment like that might have run p-System programs a little faster. (There's an early example of "VMs implemented in hardware" for you...way ahead of the JavaChip.)
Eric
Be who you are...and be it in style!
- Phil Greenspun
If a VM doesn't support garbage collection, then programs written for it will be buggier and less safe than programs written for a VM with garbage collection.
One of the biggest reasons that existing software is so unreliable and unsafe is because of its dependence on C, and the lack of both type safety and garbage collection in C. This allows buffer overflows and memory access violations. You're correct that adding garbage collection (and true type safety) doesn't buy security in and of itself, but it buys a heck of a lot of safety.
Although it's being done with Perl in mind, it's not just the Perl 6 VM; it's actually aimed at pretty much any dynamic language. Hence we should also see backends for Ruby, Python, Basic, any pretty much any language you care to impliment.
There's also talk of Parrot bytecode to Java/CLR bytecode convertors. Interesting stuff, even if we're gonna have to wait ages to actually get something useful.
Can anyone give me a substantial difference between a virtual machine, and an emulator
Others have commented on the theoretical differences, but I feel I should say something as to what distinguishes a VM from an emulator in practice. Virtual machines do not promote piracy because software is designed to run on virtual machines. On the other hand, an emulator is often written with unlawful redistribution of proprietary software in mind, even if it is wink-wink-nudge-nudge.
because I can't see whats different between my mame and java virutal machine
I find the most important difference between MAME and JVM that there is a much larger library of free software designed to run under JVM than under MAME.
Will I retire or break 10K?
/me looks in the mirror
Sorry, don't follow you. I don't have the same attitude, having practical experience in the "real world" designing and developing, I've found that some VM-based languages can be slow, even if code is designed well. Usually, performances depends on the code itself.
Sure, having an opinion based on ignorance is valid. However, it's still rooted in ignorance. One of the purposes of discussion is to share information, and that's what I do.
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Any protection scheme that relies on a single test point is child's play to circumvent. If you have code sequence like so:
can be trivially defeated by simply adding one jump:
now, insert your Z80 interpreter where the above code reads "<do some magic on the credentials>" and see how hard it is to defeat. Even liberally sprinkling your program with calls to the magic won't help, it just increases the number of times that cracker has to insert jumps into the machine code.
I couldn't agree less. I flipped through the book in the bookstore, and I wasn't impressed. Blunden is a C/Assembly language programmer with little understanding of the requirements that a modern programming language places on a virtual machine. So his virtual machine is single threaded and runs in a fixed block of address space, with a fixed size code and data section, a growable stack, and a growable explicitly managed heap. This is fine if the target language is C or assembly language, but not so fine if you want garbage collection, threads, closures, first class continuations, or any of those other language features that were considered cutting edge back in the 1970s. How does his system link to external code, like the system calls in libc? Well, there are 11 "interrupts" called int0 through int10, sort of like the DOS system call interface.
His explanation of why he doesn't support garbage collection is pretty muddled: basically, he's not comfortable with the idea, and doesn't think its practical.
Although I think that a register machine probably is better than a stack machine for this kind of system, he gives none of the arguments I was expecting to see to support this design decision. Instead, we get vague handwaving: apparently, he's more comfortable with register machines, because that's what he's used to.
Doug Moen
I have written a truly remarkable program which this sig is too small to contain.
So, your code is *not* just a jump to a registration routine that happens to run in a Z80 VM. That would indeed be trivial to hack.
Instead, the code runs in the Z80 VM at a high level. Presumably, you run 99% of the cycles in native code, but the high level control is Z80 VM. So, let's say the top-level loop has 1000 or so instructions of Z80 code, some of which are the registration routine.
If the attacker looks for the code that displays a dialog saying "you are not registered" they will simply find the code for branching in the VM, which is executing many times for other purposes.
If I were the attacker, my approach would probably then be to find the address of the code that would be executed by the VM if it had *not* branched, and to replace the "you are not registered" code with a jump back to that location.
Alternatively, I might observe that the same code was being executed again and again, and surmise that it had multiple purposes. I might splice in a test for the Nth execution of the code and have it not branch. That would slow down the VM, but since you aren't running much code in the VM it won't make a big difference.
Bottom line? Yes, it probably would trip up some people. If your program is only $30, they will get discouraged. If it's $300 they will probably figure it out.
Is there anything I'm missing?
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
They make the Java and php VMs look like the Johnny-come-lately's that they are. (Sorry Squeak! Wa-ay too slow.)
Blazing fast object allocation (both) interning & loading (VA by a nose,) and both have a full IDE that the others have been trying to achieve since Smalltalk'80 came out.
But remember thei're IDEs not production/delivery. For that you want internationalizable, database drivable GUIs, dialog managers, state machine and transition engines.
All in all. Look at VW & VA and weep. (Or better yet learn.) They've been at it since the days of UCSD Pascal. They've forgot more than you'll ever know.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.