Project Aims For 5x Increase In Python Performance

← Back to Stories (view on slashdot.org)

Project Aims For 5x Increase In Python Performance

Posted by ScuttleMonkey on Friday March 27, 2009 @08:29AM from the do-stupid-things-faster-and-with-more-energy dept.

cocoanaut writes "A new project launched by Google's Python engineers could make the popular programming language five times faster. The project, which is called Unladen Swallow, seeks to replace the Python interpreter's virtual machine with a new just-in-time (JIT) compilation engine that is built on LLVM. The first milestone release, which was announced at PyCon, already offers a 15-25% performance increase over the standard CPython implementation. The source code is available from the Google Code web site."

25 of 234 comments (clear)

Min score:

Reason:

Sort:

Speed ups for EVE online, perhaps? by KnightElite · 2009-03-27 08:33 · Score: 5, Insightful

I hope this translates into further speed ups for EVE online down the road.
1. Re:Speed ups for EVE online, perhaps? by idlemachine · 2009-03-27 14:28 · Score: 5, Informative
  
  I believe EVE uses Stackless Python. I'm not sure how well these improvements would translate across.
Kill the GIL! by GlobalEcho · 2009-03-27 08:36 · Score: 5, Informative

The summary misses one of the best bits -- the project will try to get rid of the Global Interpreter Lock that interferes so much with multithreading.
Also, it's based on v2.6, which they are hoping will make 3.x an easy change.
1. Re:Kill the GIL! by dgatwood · 2009-03-27 08:55 · Score: 4, Insightful
  
  The key is to find the right balance of granularity in locking. A big giant mutex is always a bad idea, but having tens of thousands of little mutexes can also be bad due to footprint bloat and the extra time needed to lock all those locks. The right balance is usually somewhere in the middle. Each lock should have a moderate level of contention---not too little contention or else you're wasting too much time in locking and unlocking the mutex relative to the time spent doing the task---not too much contention or else you're likely wasting time waiting for somebody else that is doing something that wouldn't really have interfered with what you're doing at all. Oh, and reader-writer locks for shared resources can be a real win, too, in some cases.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
2. Re:Kill the GIL! by Just+Some+Guy · 2009-03-27 09:28 · Score: 4, Insightful
  
  Good luck with that. The last time someone tried that, they slowed Python down by half.
  Yes, good luck with that! Because the current implementation slows it down by 7/8ths on my 8-core server.
  
  --
  Dewey, what part of this looks like authorities should be involved?
3. Re:Kill the GIL! by Red+Alastor · 2009-03-27 09:39 · Score: 4, Informative
  
  Good luck with that. The last time someone tried that, they slowed Python down by half.
  Only because Python uses a refcounting garbage collector. When you get many threads, you need to lock all your data structures because otherwise you might collect them when they are still reachable. This project plans to change the garbage collection strategy first. Once it's done, killing the GIL is easy.
  
  --
  Slashdot anagrams to "Sad Sloth"
4. Re:Kill the GIL! by Nevyn · 2009-03-27 09:47 · Score: 4, Interesting
  
  Then you probably want to read: Patrick Logan on why SMT isn't "awesomez".
  
  --
  ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
5. Re:Kill the GIL! by jd · 2009-03-27 09:56 · Score: 4, Insightful
  
  If developers were working from a clean-slate and didn't have the problems of excessive legacy code to work with, I suspect Digital Mars' D, Inmos' Occam and Erikkson's Erlang would be the three main languages in use today.
  If hardware developers were working from a clean-slate, you'd probably also see a lot more use of Content Addressable Memory, Processor-In-Memory and Transputer/iWarp-style "as easy as LEGO" CPUs.
  Sadly, what isn't patented was invented 30 years too late and 20 years before the technology existed to make these ideas really work, so we're stuck with neolithic monoliths in both the software and hardware departments.
  (Remember, Y2K was worth tens of billions, but wasn't worth enough to get people to stop using COBOL, and that was practically dead. To get people to kick their current habits would need a kick in the mind a thousand times bigger.)
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
How fast is five times faster really? by LingNoi · 2009-03-27 08:36 · Score: 5, Funny

They say five times faster however it really depends on if they're talking about a European or African Python Interpreter.
1. Re:How fast is five times faster really? by ArsonSmith · 2009-03-27 08:38 · Score: 4, Funny
  
  Java spokes person: "5x faster? We already do that."
  Java spokes person to other java people: "(whisper)Hehe, I told them we already do that. Hehe."
  
  --
  Paying taxes to buy civilization is like paying a hooker to buy love.
2. Re:How fast is five times faster really? by meringuoid · 2009-03-27 09:27 · Score: 5, Funny
  
  Joking aside, though, I find this target to be overambitious. Speeding up by a factor of three would be plausible; two would be OK, but I'd hope they'd keep working on it to get it up to three. Four strikes me as unlikely, and five is right out.
  
  --
  Real Daleks don't climb stairs - they level the building.
3. Re:How fast is five times faster really? by AigariusDebian · 2009-03-27 14:29 · Score: 4, Funny
  
  The difference is more like between:
  Prepare the bread. Put the sauce on the bread. Put the cheese on the sauce on the bread. Bake.
  And:
  define PizzaDoughFactory : AbstractDoughFactory{ sub PizzaDoughFactory( PizzaDoughFactory cls, Integer thickness ){ cls.AbstractDoughFactory( thickness ) }
  sub Sauce ( PizzaDoughFactory cls, Topping top){ cls.toppings = org.coolpace.JavaSmart.List( -1 ) cls.toppings.appendToTop( top ) } }
  define PizzaCreator : AbstractApplication { def main( Integer argc, String *argv ){ new pizza = PizzaFactory() pizza.set_dough = PizzaDoughFactory() sauce = SauceFactory() cheese = CheeseFactory() pizza.dough.Sauce( sauce ) pizza.dough.Sauce( cheese ) // historically all toppings are called sauces as well new ready_pizza = PizzaBakery( pizza ) } }
This is a very interesting project by Max+Romantschuk · 2009-03-27 08:36 · Score: 5, Interesting

I read about what they intend to do, and they seem to have quite a few interesting ideas... But there are also major drawbacks:
- No Windows support (apparently a Linux-only VM in the plans)
- No Python 3.0 support
And thus no guarantees most of the work will merge back into CPython.
But competition is good, I can't really see a problem with having an alternative faster Python runtime, even if it's not as compatible as CPython. :)

--
.: Max Romantschuk :: http://max.romantschuk.fi/
1. Re:This is a very interesting project by MightyYar · 2009-03-27 08:50 · Score: 4, Informative
  
  Psyco is x86 only and uses a lot of memory. It also requires additional coding... you have to actively use it, so you don't automatically get the speedup that a faster interpreter gets you. You also have to pick-and-choose what you want to get compiled with Psyco - the extra overhead isn't always worth it.
  To be fair, I don't know what the memory requirements of this new project are.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
2. Re:This is a very interesting project by maxume · 2009-03-27 09:00 · Score: 4, Informative
  
  It might be easy to port over to 3.0, but not because it is using 2.6. Basically, they are planning on ripping out a big chunk of the internals of 2.6 and replacing it with a LLVM based system. To the extent that those internals changed for 3.0 (there wasn't necessarily effort put into making them compatible across 2.6 and 3.0...), the code would need to be updated for 3.0. The python level portability between 2.6 and 3.0 isn't a huge factor for something like this.
  They are targeting 2.6 because that is what made sense for Google (who is paying for the work). Or so they say:
  http://code.google.com/p/unladen-swallow/wiki/FAQ
  
  --
  Nerd rage is the funniest rage.
3. Re:This is a very interesting project by Tumbleweed · 2009-03-27 09:11 · Score: 4, Funny
  
  I'm not quite sure what benefits this gives that Psyco doesn't already.
  It doesn't get as stabby.
4. Re:This is a very interesting project by MightyYar · 2009-03-27 13:57 · Score: 4, Informative
  
  I think it's only Linux-only right now, because the developers currently use Linux. But they consider loss of Windows support a "risk", not a design goal:
  
  Windows support: CPython currently has good Windows support, and we'll have to maintain that in order for our patches to be merged into mainline. Since none of the Unladen Swallow engineers have any/much Windows experience or even Windows machines, keeping Windows support at an acceptable level may slow down our forward progress or force us to disable some performance-beneficial code on Windows. Community contributions may be able to help with this.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:Unladen Swallow by davester666 · 2009-03-27 08:38 · Score: 5, Funny

It would still be huge! :-)

--
Sleep your way to a whiter smile...date a dentist!
slowed it down by half? by Anonymous Coward · 2009-03-27 08:50 · Score: 5, Funny

0.5x slower is like 2x faster, right? Reciprocals?
Re:No windows by Anonymous Coward · 2009-03-27 09:03 · Score: 4, Informative

Quite to the contrary, the FreeBSD guys have been building with clang+llvm for a while now, and they seem to like it. The kernel boots, init inits, filesystems mount, the shell runs.
What other platforms, Darwin? Apple employs the largest number of LLVM developers. Windows? Both MinGW and Visual Studio based builds are tested for each release.
It's still not as portable as the python interpreter, but that will come if and when developers who are interested in working on it start to contribute.
Re:It's probably pining for the fiords. by Abcd1234 · 2009-03-27 09:03 · Score: 4, Informative

Not really. Parrot is a much higher-level VM, providing things like closures, multiple dispatch, garbage collection, infrastructure to support multiple object models, and so forth, whereas LLVM really models a basic RISC instruction set with an infinite number of write-only registers.
In fact, it would make a fair bit of sense to actually use LLVM as the JIT-compiling backend for Parrot...
Binspam by Thelasko · 2009-03-27 09:09 · Score: 5, Funny

I get emails claiming to increase my python's performance all of the time, I just delete them.

--
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
Re:Too many levels of translation? by Abcd1234 · 2009-03-27 09:21 · Score: 4, Informative

Wouldn't a more direct compile yield a better result?
No, it wouldn't.
The entire point of LLVM is that it provides an easy-to-target machine (it's basically a RISC instruction set) that you can use as your intermediate representation (the p-code you described). You then use the LLVM backends to compile the IR down to machine code. And because of the way the IR is structured (for example, it has write-only registers, which makes certain classes of optimizations much easier), you can do a really good job of optimizing.
Basically, you "direct compile" to the LLVM IR, and then let LLVM take care of the details of generating the machine code. This gives you better abstraction (no more machine-specific code generation in Python itself), portability (to whatever LLVM targets), and you get all the sophisticated optimization that LLVM provides for free. That's a huge potential win.
It all depends by mkcmkc · 2009-03-27 09:38 · Score: 5, Insightful

I find Python is about 20x slower (and about 10x faster to implement) than C, with the number varying quite a bit depending on how CPU-bound the code is. Given the speed of modern processors, this is plenty fast for many tasks.
Beyond that, many Python programmers employ a strategy of writing just the CPU-intensive inner loops in C or C++. This gives you most of the speed of an all-compiled solution but with much of the easier programming (and shorter programs) of the all-Python approach.
My particular scientific application runs on 1500 cores, is about 75% Python/25% C++, is 4-5x smaller than similar all-C/C++ programs, and runs at about 95-99.99% of the speed of an all C++ solution.
(Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)
Not all apps will fall out this way, but you definitely can't assume that just because something's written in Python that it will be slow.
(Going beyond that, we all know that better algorithms usually trump all of this anyway. If writing in Python gives you the time and clarity to be able to use an O(n)-better algorithm, that may pay off in itself.)

--
"Not an actor, but he plays one on TV."
Effort in wrong place by Animats · 2009-03-27 11:24 · Score: 4, Informative

This is disappointing. Shed Skin has shown speed improvements of 2 to 220x over CPython. Going for 5x over CPython is lame. But Shed Skin is a tiny effort, and needs help.
PyPy got a lot of press, but they tried to do an optimizing compiler with "agile programming" and "sprints", and, at six years on with substantial funding, it's still not done.
The fundamental problem with running Python fast is its gratuitous dynamism. In CPython, almost everything is late-bound, and most of the time goes into name lookups. This makes it easy to treat everything as dynamic. You can store into the local variables of a function from outside the function, for example. In order to make Python go fast, the compiler has to be able to detect the 99.99% of the time when that isn't happening and generate pre-bound code accordingly.
Dynamic typing requires similar handling. Most variables never change type. Recognizing int and float variables that will never contain anything else creates a significant speedup. In CPython, all numbers are "boxed", stored in an object structure. This is general but slow.
CPython is nice and simple, but slow. Serious speedup requires global analysis of the program to detect the hard cases and generate fast code for the easy ones. Shed Skin actually does this, but has to place some limitations on the language to do it. If someone did everything right, Python could probably achieve the speed of C++.
There's also the problem that if you want to be compatible with existing C modules for CPython, you're stuck with CPython's overly general internal representation.