Project Aims For 5x Increase In Python Performance
cocoanaut writes "A new project launched by Google's Python engineers could make the popular programming language five times faster. The project, which is called Unladen Swallow, seeks to replace the Python interpreter's virtual machine with a new just-in-time (JIT) compilation engine that is built on LLVM. The first milestone release, which was announced at PyCon, already offers a 15-25% performance increase over the standard CPython implementation. The source code is available from the Google Code web site."
I hope this translates into further speed ups for EVE online down the road.
What if Oprah's ass got 5x's smaller?
The summary misses one of the best bits -- the project will try to get rid of the Global Interpreter Lock that interferes so much with multithreading.
Also, it's based on v2.6, which they are hoping will make 3.x an easy change.
They say five times faster however it really depends on if they're talking about a European or African Python Interpreter.
I read about what they intend to do, and they seem to have quite a few interesting ideas... But there are also major drawbacks:
- No Windows support (apparently a Linux-only VM in the plans)
- No Python 3.0 support
And thus no guarantees most of the work will merge back into CPython.
But competition is good, I can't really see a problem with having an alternative faster Python runtime, even if it's not as compatible as CPython. :)
.: Max Romantschuk
It would still be huge! :-)
Sleep your way to a whiter smile...date a dentist!
0.5x slower is like 2x faster, right? Reciprocals?
Quite to the contrary, the FreeBSD guys have been building with clang+llvm for a while now, and they seem to like it. The kernel boots, init inits, filesystems mount, the shell runs.
What other platforms, Darwin? Apple employs the largest number of LLVM developers. Windows? Both MinGW and Visual Studio based builds are tested for each release.
It's still not as portable as the python interpreter, but that will come if and when developers who are interested in working on it start to contribute.
Not really. Parrot is a much higher-level VM, providing things like closures, multiple dispatch, garbage collection, infrastructure to support multiple object models, and so forth, whereas LLVM really models a basic RISC instruction set with an infinite number of write-only registers.
In fact, it would make a fair bit of sense to actually use LLVM as the JIT-compiling backend for Parrot...
I get emails claiming to increase my python's performance all of the time, I just delete them.
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
Wouldn't a more direct compile yield a better result?
No, it wouldn't.
The entire point of LLVM is that it provides an easy-to-target machine (it's basically a RISC instruction set) that you can use as your intermediate representation (the p-code you described). You then use the LLVM backends to compile the IR down to machine code. And because of the way the IR is structured (for example, it has write-only registers, which makes certain classes of optimizations much easier), you can do a really good job of optimizing.
Basically, you "direct compile" to the LLVM IR, and then let LLVM take care of the details of generating the machine code. This gives you better abstraction (no more machine-specific code generation in Python itself), portability (to whatever LLVM targets), and you get all the sophisticated optimization that LLVM provides for free. That's a huge potential win.
I find Python is about 20x slower (and about 10x faster to implement) than C, with the number varying quite a bit depending on how CPU-bound the code is. Given the speed of modern processors, this is plenty fast for many tasks.
Beyond that, many Python programmers employ a strategy of writing just the CPU-intensive inner loops in C or C++. This gives you most of the speed of an all-compiled solution but with much of the easier programming (and shorter programs) of the all-Python approach.
My particular scientific application runs on 1500 cores, is about 75% Python/25% C++, is 4-5x smaller than similar all-C/C++ programs, and runs at about 95-99.99% of the speed of an all C++ solution.
(Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)
Not all apps will fall out this way, but you definitely can't assume that just because something's written in Python that it will be slow.
(Going beyond that, we all know that better algorithms usually trump all of this anyway. If writing in Python gives you the time and clarity to be able to use an O(n)-better algorithm, that may pay off in itself.)
"Not an actor, but he plays one on TV."
You're not CPU bound until you: add all the features, handle the special cases, add the error checking, scale up beyond trivial test data, etc.
Then what? Rewrite?
Yes. If you didn't know all of that was going to happen, you're prototyping. If you're prototyping, you should be doing it in a prototyping language.
Rewriting from Python to C++ is not particularly difficult. Completely overhauling the design of a project written entirely in C++ is really unpleasant and takes a long time. So much so that many early design decisions on large C++ projects simply cannot be undone.
Model in clay first, then in stone later if you have to.
"Not an actor, but he plays one on TV."
This is disappointing. Shed Skin has shown speed improvements of 2 to 220x over CPython. Going for 5x over CPython is lame. But Shed Skin is a tiny effort, and needs help.
PyPy got a lot of press, but they tried to do an optimizing compiler with "agile programming" and "sprints", and, at six years on with substantial funding, it's still not done.
The fundamental problem with running Python fast is its gratuitous dynamism. In CPython, almost everything is late-bound, and most of the time goes into name lookups. This makes it easy to treat everything as dynamic. You can store into the local variables of a function from outside the function, for example. In order to make Python go fast, the compiler has to be able to detect the 99.99% of the time when that isn't happening and generate pre-bound code accordingly.
Dynamic typing requires similar handling. Most variables never change type. Recognizing int and float variables that will never contain anything else creates a significant speedup. In CPython, all numbers are "boxed", stored in an object structure. This is general but slow.
CPython is nice and simple, but slow. Serious speedup requires global analysis of the program to detect the hard cases and generate fast code for the easy ones. Shed Skin actually does this, but has to place some limitations on the language to do it. If someone did everything right, Python could probably achieve the speed of C++.
There's also the problem that if you want to be compatible with existing C modules for CPython, you're stuck with CPython's overly general internal representation.
The Python object files are just a more convenient way to store the program compared to text files. No information is lost or glue is added in that first step.
LLVM is, like its name suggests, really low level. You should think of it as a kind of portable assembly. It's much closer to actual hardware architectures than for example Java byte code. I don't expect much overhead from the LLVM to native step. A while ago I ran some tests with C++ compiled by GCC directly to native and compiled by GCC to LLVM byte code and then by LLVM to native; sometimes one approach was faster and sometimes the other, but they were pretty close.
So that leaves the glue added in the Python object to LLVM step. I expect this to have a significant overhead, but I don't see it becoming a smaller overhead by going directly to native. The advantage of using LLVM is that you only have to write this step once, instead of once for each architecture.
With LLVM it is possible to compile parts of the interpreter to LLVM byte code in advance and then inline that into the program being JIT-compiled. That way, you can be sure that the JIT and the interpreter actually do the same thing. Apple did this for their OpenGL driver, there is a nice presentation (PDF) about it.
The thing about Python is you are replacing every lost hour in runtime with a day gained in development time. That is the point of Python. Numpy (formerly scipy I think) is mostly written in C anyway and provides fast n-dimensional array objects for vector and matrix operations, there are really only a few bottlenecks for maths/science purposes. Generally anything that is going to take a seriously long amount of time you would be doing in C over anything else anyway, what Python is is a viable alternative to Matlab etc, and a damn sight less expensive!
Where I study Engineering they teach Python for this very reason. It has a gentle syntax which appeals to engineers and scientists who often aren't bargaining to become coders, and it is so much cheaper than Matlab that any missing features are rendered a moot point.
Seriously, sitting on the sidelines and saying "I'm not gonna use Python because it is slow" is silly, it is so damn easy to code in python that you would learn it in a weekend if you already have coding experience. And as I said before, any lost time running python scripts over other languages is made up ten time over at least in the ridiculously short development times that go with Python scripts. Yes, it really is THAT easy to do anything in Python, there is a reason people bug you to try it out. Just give it a weekend, Python deserves it!