Low Level Virtual Machine 1.3 Released
RSpencer writes "The Low Level Virtual Machine project has released version 1.3. There are full release notes available. LLVM is a source-language agnostic toolkit for building compilers, optimizers, and jit or interpreted virtual machines. LLVM provides extensive optimization support, three mid-level IR formats (bytecode, assembly, and C++), three backend targets (x86,Sparc,PPC), full documentation, and a very simple and unique design. This new toolkit approach to compiler related tools is quickly attracting new developers who are making significant contributions to the work. Visit the home page where you can learn all the details. LLVM is funded by the National Science Foundation, MARCO/DARPA, and supported by UIUC's Computer Science department and other developers."
In some ways, I think this is neat. Dynamic recompilation is starting to become necessary in a lot of software (SQL, for databases, the code for emulation, hell just providing a scripting language for customizing your software would make use of this.)
On the other hand, I've been kind of jaded by lots of little projects that didn't end up doing too much. Who knows? This could become another Sqlite (An invaluable tool in my toolkit that allows anyone to incorporate a very full-featured database into their code, with all of the code in the public domain. [As soon as I have something to contribute back, I will, but it's hard to do, since they do such great work, and quickly, too.])
I'm gonna give this a try, and keep my fingers crossed...
LLVM is a very young project (only 3 years old) but has already made dramatic progress in it's time. Check out the status updates on the left hand side of the main site to see the rate of progress.
Building a full C/C++ compiler is no small feat!
-Chris
So...now we have various implementations of the Java VM, the .NET VM, Parrot, and LLVM, plus various emulators of real machines, and let's not forget the real machines themselves.
What I would like to know is how they all compare. How fast does a typical program run? How portable is the implementation; how easy can the bytecode be transformed to native code for various architectures? How easy is it to target this machine? How well does the machine cope with various programming languages (esp. Common LISP)? How stable (backward compatible) is the bytecode? What are the licensing terms? Does it communicate with the host system, and how well? Etc...
Please correct me if I got my facts wrong.
If I ever do build a satisfactory parser with Bison, I wonder how it would interface with LLVM. I tried converting a toy Bison parser to C++ and it seemed like there were some rough edges.
The casual Slashdot reader may roll his/her eyes when they see yet another Virtual Machine - but this project is much more than that. It's a complete compiler infrastructure project that will one day surpass GCC. Why? Because it's around ten times easier to understand and written in a modern language (C++) rather than C. An expert C++ programmer could start contributing code to LLVM in under a month; whereas an equivalent learning curve for GCC is at least a year. Writing new compiler passes or complete language front ends for LLVM is very straight-forward, for example. The LLVM AST has the advantage of having strong typechecking and not arcane tree macros as in GCC. LLVM is not burdened with the legal or philosophical wranglings of GCC where they do NOT WANT their compiler to be the backend of a commercial compiler and try their damnedest to obscure and change the programming API from release to release. The GCC "Toy" example language has not worked in many releases for this very reason.
GCC recently voted down using C++ in their core code. Perhaps LLVM at the very least will drag GCC into the modern age due to competition.
The VM part of LLVM is just icing on the cake.
(And yes, I am aware that LLVM uses GCC 3.4's C and C++ front-end code. That's a good thing for the short term. Perhaps longer term they will develop their own front-ends.)
Tools that simplify witting new languages excite me. I see many significant programming tasks as language interpretation or translation (compilation). The "language" doesn't have to be in the form of a concrete text file: interpretation could mean interpreting UI events; compilation could mean converting one data structure into another. From this perspective, meta programming (language authoring) is very important. Obviously not all parsing tasks are well served by lex and yacc, and similarly I wouldn't expect LLVM (or MLRISC, or whatever...) to be a perfect fit for all compilation tasks. But where they are a good fit I expect to be able to write software that is simpler, more robust, and more fully featured. That makes me happy.
There are 10 kinds of people: Those that understand ternary; those that don't; and those that don't care.
How complete is the API? The power of the Java and .NET VMs (I don't know Parrot well enough to comment) is their standard libraries -- perhaps to a larger extent than the bytecodes themselves.
Array bounds checking is not new. Dynamically loading code isn't new. What was new was the creation of a standardized toolkit and API that handled threading and network I/O and GUI and database access and XML parsing and... You get the picture.
Another portable VM holds little value for me if in the end I just end up back to the "good ol' days" of C where you were given a hammer and told to build a house. POSIX isn't enough.
- I don't need to go outside, my CRT tan'll do me just fine.
The very best trolls always start with a grain of truth. (LLVM is much easier to understand than GCC. The GCC infrastructure is very baroque, dating from a time when assuming the presence of an ANSI C bootstrap compiler was too much. One of the major LLVM guys has presented his toolchain work at the annual GCC Summit, and maintains close communication with the rest of the GCC team -- and we wish him well. All very true; no GCC hacker would say any less.)
The trolls then move on into wild exaggerations and complete lies. Such as:
Pure malicious bullshit. RMS doesn't want proprietary backends to be able to read the GCC IR, and so we don't ever write it out in a machine-readable format. But we've never gone out of our way to obfuscate the internal API.
Again, a complete lie. We asked RMS whether we could make use of C++ in parts of the compiler. While a skilled and brilliant C and LISP hacker, RMS is a reactionary fuckhead when it comes to anything other than C or LISP. In his opinion, all GNU programs should be written in C, and only C, ever.
There was no vote.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
RMS doesn't want proprietary backends to be able to read the GCC IR, and so we don't ever write it out in a machine-readable format.
Then open source backends won't be able to read it either, but that's apparently okay with RMS, given his priorities. Since only REALLY good commercial software would have a chance against a no-cost incumbent, he is willing to keep great open source alternatives from becoming available in order to keep great commercial alternatives from becoming available.
This is the guy who proclaims that ALL commercial software developers are "unethical".
Open Source to me is about openness: the source is yours to use as you wish, the code can be scripted externally via a command-line interface, it can be incorporated as a library into your own code with minimal effort with a nice API intentionally designed for that purpose, it has a modular architecture that encourages others to compete at the replacement module level without having to rebuild the whole app, etc.--all the ways code can be opened: the least restrictions and the greatest usability.
This is not RMS's agenda. RMS has made his priorities clear. He has never claimed to support the "open source movement", only to be somewhat allied with it to the extent it supports his own anti-commercial software movement.
We asked RMS whether we could make use of C++ in parts of the compiler. While a skilled and brilliant C and LISP hacker, RMS is a reactionary fuckhead when it comes to anything other than C or LISP.
Having heard him speak on many occasions over the years, it's my impression that this characterization is correct and "anything other" applies to more than just programming languages, though not to everything.
I think RMS is right on in many (but not all) of his compaints about IP laws. He's a very bright and skilled guy with a lot of great ideas, but his old-fashioned leftist political philosophies are more "anti" than they are "pro" and handicap the usefulness of his products in some unfortunate ways, and that appears to include GCC.
To the extent that LLVM and other technologies compete with GCC, I'm all for it.
"Those who have never entered upon scientific pursuits know not a tithe of the poetry by which they are surrounded."
There were a few mentions of the "LLVM Verifier" but nothing about whether it's possible to use LLVM to run untrusted code while preventing it from having unlimited access to the underlying platform. Since both JVM and IL can, a new instruction set that can't would be a big step backwards.
We asked RMS whether we could make use of C++ in parts of the compiler. While a skilled and brilliant C and LISP hacker, RMS is a reactionary fuckhead when it comes to anything other than C or LISP.
One of the requirements shaping the implementation of GCC is that it be bootstrappable on many systems, even on some UNIX(tm) systems shipped by their vendors with a half-broken C compiler (enough to recompile the kernel but not much more) and no C++ compiler.
All I'm saying is that if Java didn't have all of the items in java.util or built in threading or a standard database API, no one would've used it.
This doesn't help Common LISP as you correctly point out, but without a standard library, any language or VM is likely to be ignored.
That said, I'm curious about one of your statements. If standard libraries can be implemented on nearly anything, why haven't they been implemented? Do you think this is the result of a single company controlling the initial development of a language, the struggle between academia and tradesmen, or something else?
- I don't need to go outside, my CRT tan'll do me just fine.