Project Aims For 5x Increase In Python Performance

Speed ups for EVE online, perhaps? by KnightElite · 2009-03-27 08:33 · Score: 5, Insightful

I hope this translates into further speed ups for EVE online down the road.

Re:Speed ups for EVE online, perhaps? by Anonymous Coward · 2009-03-27 09:47 · Score: 1, Interesting

EVE uses stackless python, which needs a completely different runtime system (libraries, interpreter, etc) than vanilla python.
Re:Speed ups for EVE online, perhaps? by idlemachine · 2009-03-27 14:28 · Score: 5, Informative

I believe EVE uses Stackless Python. I'm not sure how well these improvements would translate across.
Re:Speed ups for EVE online, perhaps? by sg_oneill · 2009-03-27 17:34 · Score: 1

If it can be mushed in with the stackless patches, then it'll make eve run like a fucking gazelle.

--
Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
Re:Speed ups for EVE online, perhaps? by Gospodin · 2009-03-30 01:01 · Score: 1

...it'll make eve run like a fucking gazelle.
Could be a problem. Have you ever seen a gazelle run and fuck at the same time?

--
...following the principles of Heisenburger's Uncertain Cat...

Re:Unladen Swallow by Rip+Dick · 2009-03-27 08:33 · Score: 3, Funny

What if Oprah's ass got 5x's smaller?

Kill the GIL! by GlobalEcho · 2009-03-27 08:36 · Score: 5, Informative

The summary misses one of the best bits -- the project will try to get rid of the Global Interpreter Lock that interferes so much with multithreading.

Also, it's based on v2.6, which they are hoping will make 3.x an easy change.

Re:Kill the GIL! by eosp · 2009-03-27 08:42 · Score: 3, Interesting

The summary misses one of the best bits -- the project will try to get rid of the Global Interpreter Lock that interferes so much with multithreading.
Good luck with that. The last time someone tried that, they slowed Python down by half.
Re:Kill the GIL! by dgatwood · 2009-03-27 08:55 · Score: 4, Insightful

The key is to find the right balance of granularity in locking. A big giant mutex is always a bad idea, but having tens of thousands of little mutexes can also be bad due to footprint bloat and the extra time needed to lock all those locks. The right balance is usually somewhere in the middle. Each lock should have a moderate level of contention---not too little contention or else you're wasting too much time in locking and unlocking the mutex relative to the time spent doing the task---not too much contention or else you're likely wasting time waiting for somebody else that is doing something that wouldn't really have interfered with what you're doing at all. Oh, and reader-writer locks for shared resources can be a real win, too, in some cases.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Kill the GIL! by Viking+Coder · 2009-03-27 09:10 · Score: 2, Interesting

Hrmph.
Maybe I'm just drinking kool-aide, but Software Transactional Memory sounds much, much better to me.
The "D" programming language from Digital Mars sounds very interesting, for example.

--
Education is the silver bullet.
Re:Kill the GIL! by Just+Some+Guy · 2009-03-27 09:28 · Score: 4, Insightful

Good luck with that. The last time someone tried that, they slowed Python down by half.
Yes, good luck with that! Because the current implementation slows it down by 7/8ths on my 8-core server.

--
Dewey, what part of this looks like authorities should be involved?
Re:Kill the GIL! by Red+Alastor · 2009-03-27 09:39 · Score: 4, Informative

Good luck with that. The last time someone tried that, they slowed Python down by half.
Only because Python uses a refcounting garbage collector. When you get many threads, you need to lock all your data structures because otherwise you might collect them when they are still reachable. This project plans to change the garbage collection strategy first. Once it's done, killing the GIL is easy.

--
Slashdot anagrams to "Sad Sloth"
Re:Kill the GIL! by Nevyn · 2009-03-27 09:44 · Score: 3, Informative

That's funny, because os.fork() etc. work fine on my version of python.

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re:Kill the GIL! by Nevyn · 2009-03-27 09:47 · Score: 4, Interesting

Then you probably want to read: Patrick Logan on why SMT isn't "awesomez".

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re:Kill the GIL! by Just+Some+Guy · 2009-03-27 09:55 · Score: 1

They work great here, too, but each process model has its place. There are times when I really, really wish I could use effective threading.

--
Dewey, what part of this looks like authorities should be involved?
Re:Kill the GIL! by jd · 2009-03-27 09:56 · Score: 4, Insightful

If developers were working from a clean-slate and didn't have the problems of excessive legacy code to work with, I suspect Digital Mars' D, Inmos' Occam and Erikkson's Erlang would be the three main languages in use today.
If hardware developers were working from a clean-slate, you'd probably also see a lot more use of Content Addressable Memory, Processor-In-Memory and Transputer/iWarp-style "as easy as LEGO" CPUs.
Sadly, what isn't patented was invented 30 years too late and 20 years before the technology existed to make these ideas really work, so we're stuck with neolithic monoliths in both the software and hardware departments.
(Remember, Y2K was worth tens of billions, but wasn't worth enough to get people to stop using COBOL, and that was practically dead. To get people to kick their current habits would need a kick in the mind a thousand times bigger.)

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Kill the GIL! by Viking+Coder · 2009-03-27 10:41 · Score: 1

Nice food for thought.
I think all I really need is multi-thread safe Persistence, in my use case, with as little memory duplication as possible, of course.
Hrm - the hamster is definitely running in the wheel right now...

--
Education is the silver bullet.
Re:Kill the GIL! by Secret+Rabbit · 2009-03-27 10:59 · Score: 2, Interesting

Or one could keep *A* GIL and largely ignore it. Here's the model I would use.
Separate python into per thread instances yet keep a larger overall memory space to be shared between threads. But, one must explicitly state that they want to go to the global space. That way, when one uses a single threaded application, everything is as it should be, nothing in its way to slow it down. So, those locks won't even get invoked. However, when one is programming a multi-threaded application, then one has the *choice* to either keep them separate or to make them aware of each other and start using the GIL.
With that, I believe that one can largely have his cake and eat it too.
Re:Kill the GIL! by CarpetShark · 2009-03-27 11:13 · Score: 1

The summary misses one of the best bits -- the project will try to get rid of the Global Interpreter Lock that interferes so much with multithreading.
Thanks for that. I was about to say, that the main issue for me is the GIL, not interpreter performance. Improvement of both is good, of course, but the GIL can be a show-stopper much more easily.
Re:Kill the GIL! by harry666t · 2009-03-27 11:55 · Score: 1

> This project plans to change the garbage collection strategy first.

Shouldn't be too hard, should it?

Not so long ago I wrote a simple mark&sweep GC over a weekend, with no previous practical experience in this area at all.
Re:Kill the GIL! by parallel_prankster · 2009-03-27 12:38 · Score: 1

Not true. What you are saying is only true from the software side of things. You are assuming that fine-grained locks always cause contention. Again -> It depends on the architecture. You can have the same lock scheme perform poorly on one machine and perform amazingly good on another. Removing python GIL will definitely lead to better performance. This is a great candidate for transactional memory.
Re:Kill the GIL! by dgatwood · 2009-03-27 13:27 · Score: 1

I'm not assuming that fine-grained locks cause contention. I'm asserting that excessively fine-grained locks usually have very low contention and therefore often waste too many cycles handling the lock machinery relative to the amount of work you're actually getting done. Remember that at minimum, taking a mutex is going to require pulling in data from RAM unless it was recently touched. Therefore, there is a sizable hit for touching a lock that has no contention. With enough thrashing of even all non-contended locks, that penalty can be sizable. The sweet spot is in the middle somewhere, though the details do vary from architecture to architecture.
The best solution, of course, is to architect your code in a way that minimizes data dependencies and data sharing in general. This isn't always possible, but when it is, such a re-architected code base should perform much better than one built using mutex locks or even STM.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Kill the GIL! by shutdown+-p+now · 2009-03-27 13:31 · Score: 1

That's a rant on STM by an Erlang fan that has little in the way of rational arguments, but a lot of mentions of the word "wrong" (and also "Erlang"). So?
Re:Kill the GIL! by shutdown+-p+now · 2009-03-27 13:33 · Score: 1

It's not hard to write a GC (or take and use an existing one). But it might break quite a lot of programs that rely on deterministic behavior of a refcounting collector as seen today.
Re:Kill the GIL! by Waffle+Iron · 2009-03-27 13:42 · Score: 2, Interesting

Only because Python uses a refcounting garbage collector.
Refcounting itself isn't necessarily the problem; it's using a rudimentary implementation that bogs down. I read a paper a while back where they successfully experimented with a high-tech refcounting gc algorithm specifically because it was amenable to parallel operation on multiple CPUs.
By using a variety of tricks, they were able to avoid actually having to update refcounts for the vast majority of writes (most notably all stack references), and the mutex acquisition was limited to a couple per thread per gc cycle (one dedicated thread did periodic adjustments in a cyclical fashion).
I thought it was pretty interesting because reference counting can have more cache-friendly behavior than copying gc or mark-sweep approaches.
Re:Kill the GIL! by slamb · 2009-03-27 14:31 · Score: 1

Then you probably want to read: Patrick Logan on why SMT isn't "awesomez" [blogspot.com].

I think I'm stupider for having read that. If something can be judged by the quality of the arguments against it, then STM is indeed awesome. That post contained no actual data to support its assertion, just cherrypicked quotes of other people making the same assertion without data. It didn't meet even my rather low standards for articles linked from slashdot comments.
Re:Kill the GIL! by jhol13 · 2009-03-27 17:49 · Score: 1

Does os.fork() work in Windows too?
I am no fan of Python, but deliberately making incompatible dialects ...
Besides, processes are not threads (both have pros and cons).
Re:Kill the GIL! by igrg · 2009-03-27 18:41 · Score: 1

I wonder how it relates to psycho module? The last time I spent time optimising some Python code, I used psycho in lot of places and it was a pleasant experience.

--
~igrg
Re:Kill the GIL! by MilenCent · 2009-03-27 19:22 · Score: 1

Psyco is such a nice tool; add just a couple of lines of code, and suddenly 90% of Python programs become multiple times faster on Intel processors... assuming it's single-threaded, of course.
Psyco's been lagging a bit behind lately though, and little progress has been made on shortening that list of features it doesn't support. The main reason for this has been the developer going on to work on PyPy, a subset of Python in which a Python interpreter itself can be written. The ultimate goal appears to be another massive-speedup JIT compiler, although I'm not sure how it gets there from here.
Re:Kill the GIL! by ultranova · 2009-03-27 19:44 · Score: 2, Insightful

I thought it was pretty interesting because reference counting can have more cache-friendly behavior than copying gc or mark-sweep approaches.

That's Java's biggest problem, IMHO: once the data spills into swap, it'll take forever to run garbage collection.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Kill the GIL! by julesh · 2009-03-27 23:14 · Score: 1

The key is to find the right balance of granularity in locking. A big giant mutex is always a bad idea, but having tens of thousands of little mutexes can also be bad due to footprint bloat and the extra time needed to lock all those locks. The right balance is usually somewhere in the middle. Each lock should have a moderate level of contention---not too little contention or else you're wasting too much time in locking and unlocking the mutex relative to the time spent doing the task---not too much contention or else you're likely wasting time waiting for somebody else that is doing something that wouldn't really have interfered with what you're doing at all. Oh, and reader-writer locks for shared resources can be a real win, too, in some cases.
The other thing that the last attempt lacked is escape analysis: any object that is created and used entirely in a local scope that's only accessible to one thread doesn't need locking at all. This is easy to show in many cases via static analysis of the compiled bytecode, but this optimisation was ignored last time around.
Re:Kill the GIL! by julesh · 2009-03-27 23:16 · Score: 1

That's funny, because os.fork() etc. work fine on my version of python.
Yes, but then the overhead of serializing data structures to communicate between your two processes is a killer. Unless you can keep the communication to a minimum (i.e., you're working on an embarassingly parallel problem), this is a serious problem.
Re:Kill the GIL! by julesh · 2009-03-27 23:19 · Score: 1

Only because Python uses a refcounting garbage collector
Not only that: the standard data structures (i.e., lists and dictionaries) guarantee atomicity of operations. The last attempt locked them at every method call, which is somewhat suboptimal. Lock-free implementations would be a serious win here.
Re:Kill the GIL! by dkf · 2009-03-28 06:33 · Score: 1

That's funny, because os.fork() etc. work fine on my version of python.
Yes, but then the overhead of serializing data structures to communicate between your two processes is a killer. Unless you can keep the communication to a minimum (i.e., you're working on an embarassingly parallel problem), this is a serious problem.
It's a problem anyway. If you have structures that are being accessed by two threads at once, you need a lock (or your head examining; your choice) to stop demons from flying out of your nose. Your best approach is to keep as much as possible bound to a single thread, and to only share the minimum, preferably keeping chunks of memory assigned to a single thread or, at least, one at a time. That minimizes the number of nasty global locks.
Be aware that if you want to scale up to systems that do not share memory (e.g. a clustered supercomputer) then you have to think in terms of serializing data structures anyway (or using an MPI library that hides the details); you'll spend your time trying to work out how to minimize the amount of communication you do. Ultimately, the piper must be paid.

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Kill the GIL! by Scarblac · 2009-03-28 07:54 · Score: 1

Yes, good luck with that! Because the current implementation slows it down by 7/8ths on my 8-core server.
Well, that's not true. The interpreter has a global lock, but usually most of the time spent will be in things like I/O calls, that are written in C and thus have no problem with the GIL. You're trying to make it seem like there is no advantage to threading in Python, but that's just wrong.

--
I believe posters are recognized by their sig. So I made one.
Re:Kill the GIL! by Just+Some+Guy · 2009-03-28 09:04 · Score: 1

The interpreter has a global lock, but usually most of the time spent will be in things like I/O calls
In your programs, perhaps; mine are more CPU bound.
It's not so much that there's no benefit as that there's (often) not enough benefit to justify going through the hassle. I spend a bit of time making a very parallelizable, single-threaded program multi-threaded only to find it only got about a 10% gain. Then I spent a lot longer bit of time reworking it into a multi-processing version that runs about 7x faster on an 8-way system, but getting the IPC down was a royal pain in the butt.
If anything, I'm biased toward favoring multi-processing for reasons too numerous to list. Still multi-threading has its place and I really wish that Python didn't put such a low ceiling on its performance.

--
Dewey, what part of this looks like authorities should be involved?
Re:Kill the GIL! by julesh · 2009-03-28 11:40 · Score: 1

It's a problem anyway. If you have structures that are being accessed by two threads at once, you need a lock (or your head examining; your choice) to stop demons from flying out of your nose.
Not necessarily; there are plenty of application areas where you can easily design your data structures and access rules so that multiple threads accessing them are not a problem. Consider the application I'm currently working on, a parallel artificial neural network trainer. I have one copy of the weights, and 4 threads with a different training set each. Each runs through its training set, totalling changes to make to the weights, then passes [a pointer to] those changes off to a coordinator thread which waits until all 4 have finished before adjusting the weights and then telling them to resume with the next epoch. The weight matrix is in the range of 50-100MB, so we really don't want to have to copy it each time around. This is a much more efficient way of achieving this result than anything I can think of without shared data, and I'd love to know if anyone else can see a better solution.
Re:Kill the GIL! by renoX · 2009-03-28 12:20 · Score: 1

Yes, that's a big issue with most GC: they interact badly with swap, note that it's possible to make the GC and the kernel's Virtual Memory Manager cooperate to avoid this, see:
http://lambda-the-ultimate.org/node/2391
Now, as it only works if the kernel's VMM is patched to support this kind of GC, I wonder if swapping won't be made obsolete by memory's low price before this kind of GC becomes used..
Re:Kill the GIL! by KagatoLNX · 2009-03-28 19:04 · Score: 1

Look at the multiprocessing module.
Not only does it implement processes in a way that is quite similar to the existing threading implementation, but it provides a ton of synchronization / locking primitives that work seamlessly across processes. This includes the ability to utilize shared memory. It's crazy that this module doesn't get more press, because there's nothing quite as easy in most other languages.

--
I think Mauve has the most RAM. --PHB (Dilbert Comic)
Re:Kill the GIL! by KagatoLNX · 2009-03-28 19:05 · Score: 1

As of Python 2.6, look at the multiprocessing module.

--
I think Mauve has the most RAM. --PHB (Dilbert Comic)
Re:Kill the GIL! by KagatoLNX · 2009-03-28 19:06 · Score: 1

As of Python 2.6, look at the multiprocessing module.
The shared memory feature is nice.

--
I think Mauve has the most RAM. --PHB (Dilbert Comic)
Re:Kill the GIL! by dkf · 2009-03-28 19:48 · Score: 2, Insightful

Not necessarily; there are plenty of application areas where you can easily design your data structures and access rules so that multiple threads accessing them are not a problem. Consider the application I'm currently working on, a parallel artificial neural network trainer. I have one copy of the weights, and 4 threads with a different training set each. Each runs through its training set, totalling changes to make to the weights, then passes [a pointer to] those changes off to a coordinator thread which waits until all 4 have finished before adjusting the weights and then telling them to resume with the next epoch. The weight matrix is in the range of 50-100MB, so we really don't want to have to copy it each time around. This is a much more efficient way of achieving this result than anything I can think of without shared data, and I'd love to know if anyone else can see a better solution.
Sounds like a reasonable medium-scale approach to me - I've done similar things. But you are aware that you're, in effect, using a locking solution? And that shared memory scheme won't scale up to a cluster? (To scale it up, consider whether you can only transmit the diffs to the weight table or change the axis on which you're splitting things up so that you get better data locality. Another possibility might be to compute the weights twice or more in different threads, which trades more computation for less lock contention. Don't know which is right for your case though, since scaling up isn't easy; requires real thought sometimes.)
I suppose it might help you to have a bit more background. In many types of traditional supercomputer, a lot of effort was put into supporting a shared memory model over very large numbers of processors (e.g., a thousand or so). That's really what made them so stupendously expensive, especially through the '90s. (The CPUs themselves weren't that much better than normal desktop ones by comparison; better floating point units typically, but not by that much.) Of course, it wasn't sustainable; the memory hardware was just too much of a bottleneck (in effect there was a lock for every memory access!) so that had to go and the cluster is now king. But to take proper advantage of that, you have to start minimizing the amount of locking and communication of big memory structures; get that right (with clever algorithms, etc.) and you can go up to internet-scale apps, some of which are so big that we don't usually think of them that way.

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Kill the GIL! by Just+Some+Guy · 2009-03-29 03:20 · Score: 1

Yes, Jayson.

--
Dewey, what part of this looks like authorities should be involved?

How fast is five times faster really? by LingNoi · 2009-03-27 08:36 · Score: 5, Funny

They say five times faster however it really depends on if they're talking about a European or African Python Interpreter.

Re:How fast is five times faster really? by ArsonSmith · 2009-03-27 08:38 · Score: 4, Funny

Java spokes person: "5x faster? We already do that."
Java spokes person to other java people: "(whisper)Hehe, I told them we already do that. Hehe."

--
Paying taxes to buy civilization is like paying a hooker to buy love.
Re:How fast is five times faster really? by rackserverdeals · 2009-03-27 08:56 · Score: 3, Informative

I know you're trying to be funny but... If you're talking plain Java vs Python, Java looks to be quite a bit faster. You don't have to look hard to find benchmarks that show java is faster.
Jython seems to be about 2-3 times faster than CPython according to those test.
This could give CPython the performance edge over Jython, but it still has a way to go to catch up to Java.

--
Dual Opteron < $600
Re:How fast is five times faster really? by FishWithAHammer · 2009-03-27 09:00 · Score: 1

IronPython too. Not quite as fast as Jython last I eval'd it, but at the time it had plenty of room to improve.
The only place I currently use Python is embedding the IronPython system in a Mono app, though, so I'll take what I can get.

--
"You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
Re:How fast is five times faster really? by meringuoid · 2009-03-27 09:27 · Score: 5, Funny

Joking aside, though, I find this target to be overambitious. Speeding up by a factor of three would be plausible; two would be OK, but I'd hope they'd keep working on it to get it up to three. Four strikes me as unlikely, and five is right out.

--
Real Daleks don't climb stairs - they level the building.
Re:How fast is five times faster really? by 0xABADC0DA · 2009-03-27 09:38 · Score: 1

This could give CPython the performance edge over Jython, but it still has a way to go to catch up to Java.
Except that jdk 1.7 is getting all sorts of improvements that will help with Jython speed... like a dynamic method call opcode and stack-allocated objects.
So it's doubtful that llvm python will be faster than Jython, or at least not for long.
Re:How fast is five times faster really? by ArsonSmith · 2009-03-27 09:48 · Score: 1

I mean if I went around claiming to be faster just because I was hard coded in C they'd put me away. We have to take it in terms that a JIT engine can optimize code in real time much better than a precompiled binary. Now you see the slowdown inherit in the system.

--
Paying taxes to buy civilization is like paying a hooker to buy love.
Re:How fast is five times faster really? by kpainter · 2009-03-27 09:59 · Score: 2, Interesting

If you're talking plain Java vs Python [debian.org], Java looks to be quite a bit faster
The first link above refers to Java used with "Hotspot" and it is really fast. If you select the Java Xint, they are a lot closer although Java is still faster. But that "Hotspot" option looks to me to provide about a 10x speed improvement over plain interpreted Java. http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=javaxint&lang2=java&box=1 If Python were to do something similar, I would expect a significant improvement in its performance too.
Re:How fast is five times faster really? by fredrik70 · 2009-03-27 10:04 · Score: 1

oh, I thought the jvm already put local new'ed objects on the stack rather than the heap if it thought it was possible (i.e. for local objects in a method, etc)

--
if (!signature) { throw std::runtime_error("No sig!"); }
Re:How fast is five times faster really? by rackserverdeals · 2009-03-27 10:46 · Score: 1

The Xint option is used in very rare cases if you encounter a bug with the compiler. I have never run into one case where I needed it.
I think some people are working on JIT compilers for Python and other interpreted languages but I'm not sure of the status.

--
Dual Opteron < $600
Re:How fast is five times faster really? by wisty · 2009-03-27 11:35 · Score: 1

How is Java faster? If it's a trivial program, than it just doesn't matter. Actually, if it's a trivial program, for your own use, a Pythoneer will write the script and run the interpret (no compile!) before you can fire up Eclipse and type "private static void".
If we are talking about a non trivial program, then algorithms, data structures, caching, micro-optimization (like re-writing bits in C) and profiling can improve things by many many orders of magnitude. Too bad if the code has so many layers and adapters that any real change will be prohibitively expensive.
Re:How fast is five times faster really? by Abreu · 2009-03-27 11:36 · Score: 1

Three time faster? That's easy just change the colour of the text to red and add an antenna
An'More Daka!!!

--
No sig for the moment.
Re:How fast is five times faster really? by rackserverdeals · 2009-03-27 11:56 · Score: 3, Informative

How is Java faster? If it's a trivial program, than it just doesn't matter. Actually, if it's a trivial program, for your own use, a Pythoneer will write the script and run the interpret (no compile!) before you can fire up Eclipse and type "private static void".
You know you can write trivial java programs without using an IDE such as Eclipse. I started out in the late 90's writing Servlets in vi and notepad. The time it takes to compile is meaningless. You only need to do it once. You don't have to recompile every time you run the application.

If we are talking about a non trivial program, then algorithms, data structures, caching, micro-optimization (like re-writing bits in C) and profiling can improve things by many many orders of magnitude. Too bad if the code has so many layers and adapters that any real change will be prohibitively expensive.
Or they could use any of the many java libraries available so they don't have to write those parts of the code. Since they've been around for years, they've already been optimized.
The productivity gains of writing fewer lines of code seems stupid to me. Programmers aren't secretaries. I can't type maybe 90wpm but a few lines of code might take an hour to get right. It doesn't matter what the language is.

--
Dual Opteron < $600
Re:How fast is five times faster really? by bnenning · 2009-03-27 12:08 · Score: 1

The productivity gains of writing fewer lines of code seems stupid to me.
Correct. The win is *maintaining* fewer lines of code.

--
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
Re:How fast is five times faster really? by jonaskoelker · 2009-03-27 12:36 · Score: 1
I think the speed improvements will be reached in stages, roughly equal to:
- x1
- x2
- x5
Re:How fast is five times faster really? by rackserverdeals · 2009-03-27 13:13 · Score: 1

Correct. The win is *maintaining* fewer lines of code.
Still I consider that a bogus argument. If you're organization has 50 Java developers, the effort needed to train them to be Python developers is not trivial. Then you can't just rewrite everything because you still have all that Java code to maintain.
It's not like Python is significantly less lines of codes than Java or anything. Especially now with annotations. Maybe 2x as many LOC for a significant increase in performance and using your existing developer pool.
The example in the link is simple but there are others you can find online with similar LOC counts.
Plus, I don't think fewer LOC means greater maintainability.
Let me give an example using a pizza recipe intead of a programming language.
Fewer instructions would be something like:

Prepare the bread.
Put the sauce on the bread.
Put the cheese on the sauce on the bread.
Bake.
More instructions would be.

Prepare dough:
Mix flour, yeast, water.
Knead the bread
Let it rise
Punch it down
Roll out round shape
Prepare the sauce:
In large pot heat tomato sauce
add oregano, garlic, salt, pepper
stir frequently.
ladle an even coating of sauce over the dogh.
Evenly spread shredded mozerella over the sauce.
Bake in a 400deg oven for 15 minutes.
Clearly the first example is the easiest to write, but not the easiest to follow.
In the end, you're going to abstract a lot of that so you don't see it all upfront. You have to trace it back and you might use a library or code generate to take care of the boiler plate.
For example. One of my projects that took maybe 2-3 weeks to write, then some additions here and there. That includes all the database planning as well. It's maybe 80k lines of code. 60% of that was auto-generated by an open source DAO tool that handles all the persistence. Throw in the import statements, field accessors and other code that the IDE adds in for me and I probably only wrote about 20k lines of code for the project.
If you use JPA, EJB3 for the persistance, you might have even fewer lines of code. Maybe I shouldn't still be using DAO but I like it and didn't need much more for this project.
The main problem I see though. In 5 years, a lot of those Python developers are probably going to be working in a different language all together.

--
Dual Opteron < $600
Re:How fast is five times faster really? by bnenning · 2009-03-27 14:18 · Score: 3, Insightful

If you're organization has 50 Java developers, the effort needed to train them to be Python developers is not trivial. Then you can't just rewrite everything because you still have all that Java code to maintain.
Yes, you shouldn't rewrite working Java code in Python just for kicks, or vice versa. I'm not sure how that's relevant.
It's not like Python is significantly less lines of codes than Java or anything. Especially now with annotations. Maybe 2x as many LOC
I'll agree that 2x is in the ballpark, and I find that to be quite significant, considering that studies have found that developers tend to produce lines of (debugged, working) code at the same rate regardless of language. Doubling developer productivity will very often be worth sacrificing performance, especially when the software isn't CPU-bound. Why do you think Java took over from C?
Plus, I don't think fewer LOC means greater maintainability.
All I can say is that I've been developing in Java for 12 years and Python for 2, and that's been my experience.
Let me give an example using a pizza recipe intead of a programming language.
I don't agree with that, because the short version leaves out critical information so of course it's not as useful. What I like about Python is that it largely lets me deal with *only* the stuff that matters to my application. In my questionable metaphor Python would be "Bake at 400 degrees for 15 minutes", and Java would be "Turn the temperature dial to 400, open the oven door, insert the pan in the oven, close the oven door, wait 15 minutes, open the oven door...". Ok not quite that bad, but the essential details are often obscured by unimportant boilerplate. And yes, you can get tools that automatically create and hide some of it, but that should just make you question why the language can't do that itself.
The main problem I see though. In 5 years, a lot of those Python developers are probably going to be working in a different language all together.
A fine argument for COBOL :)

--
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
Re:How fast is five times faster really? by AigariusDebian · 2009-03-27 14:29 · Score: 4, Funny

The difference is more like between:
Prepare the bread. Put the sauce on the bread. Put the cheese on the sauce on the bread. Bake.
And:
define PizzaDoughFactory : AbstractDoughFactory{ sub PizzaDoughFactory( PizzaDoughFactory cls, Integer thickness ){ cls.AbstractDoughFactory( thickness ) }
sub Sauce ( PizzaDoughFactory cls, Topping top){ cls.toppings = org.coolpace.JavaSmart.List( -1 ) cls.toppings.appendToTop( top ) } }
define PizzaCreator : AbstractApplication { def main( Integer argc, String *argv ){ new pizza = PizzaFactory() pizza.set_dough = PizzaDoughFactory() sauce = SauceFactory() cheese = CheeseFactory() pizza.dough.Sauce( sauce ) pizza.dough.Sauce( cheese ) // historically all toppings are called sauces as well new ready_pizza = PizzaBakery( pizza ) } }
Re:How fast is five times faster really? by dirtyhippie · 2009-03-27 15:46 · Score: 1

Please mod racist troll parent down. Thx.
Re:How fast is five times faster really? by chthonicdaemon · 2009-03-27 19:32 · Score: 1

From what I've read the figures seem to suggest two things: 1. Programmers produce similar lines of code per hour in most languages. 2. Language expressiveness differs, so some languages get more things done per line of code. These two things translate to my idea of using the most expressive language I can get my hands on. Existence of libraries and such also factor into the equation, which is probably why APL doesn't see such a lot of use.

Now, it is also true that some compilers/interpreters are more efficient than others (note that languages aren't efficient per se). But rewriting code is a lot faster than writing it for the first time regardless of language. The efficient compilers tend to be paired with the less expressive languages (mostly because it's harder to write compilers for the more expressive languages), but you can make up for it by profiling your short program and rewriting the parts that are slow in a language for which an efficient compiler exist.

As for not seeing productivity gains from writing less lines of code, remember that was what the assembly guys said when Fortran came out.

--
Languages aren't inherently fast -- implementations are efficient
Re:How fast is five times faster really? by ultranova · 2009-03-27 19:55 · Score: 1

Please mod racist troll parent down.

Just because trolls are horrible monsters good for nothing but experience points doesn't mean that you should discriminate against them, especially in a post decrying racism. Knock off your humanocentristic elitism, you fascist hippie!

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:How fast is five times faster really? by noppy · 2009-03-27 20:28 · Score: 1

Cool. What used to be sssssssssh is high-pitched screeeeeeeeeech.
Re:How fast is five times faster really? by setagllib · 2009-03-27 20:53 · Score: 1

Just a data point, several releases of the JVM as supplied with JRE6 would break Eclipse on amd64. It's fixed now, but during that time, explicitly blacklisting certain classes/methods from JIT compilation, or using -Xint, were recommended just so you could work at all.

--
Sam ty sig.
Re:How fast is five times faster really? by RAMMS+EIN · 2009-03-28 02:18 · Score: 1

``You know you can write trivial java programs without using an IDE such as Eclipse.''
Of course you can. But Java is one of those languages that really benefit from an IDE. The reason is that Java programs invariable contain a lot of boilerplate code. Since this code is repetitive and doesn't require a lot of thinking to write, writing it can easily be automated, which is what IDEs do. Other languages don't require as much boilerplate to begin with, and thus benefit less from an IDE.
``The time it takes to compile is meaningless. You only need to do it once.''
I disagree. You need to compile only once only if you get the program right the first time. This means it has to be bug free, feature complete, and never receive any updates. This may indeed work for trivial programs, but, in practical software development, it doesn't happen. You write your programs in a number of iterations, each adding some features or fixing some bugs, and then you test to see if it works as you intended. Compile time can be a significant part of this process.
``You don't have to recompile every time you run the application.''
Funny you should say that, because Java runtimes typically do exactly that. That is to say, compiling Java bytecode to native machine code. All in all, Java programs typically take quite some time to start, and take a couple of minutes to really get up to speed. This causes small, short, and linear (not a lot of loops or repetition) to underperform (e.g. you wouldn't want this behavior for Unix commands).

--
Please correct me if I got my facts wrong.
Re:How fast is five times faster really? by Dan+Ost · 2009-03-28 02:31 · Score: 1

I wish I had mod points today. Excellent post.

--

*sigh* back to work...
Re:How fast is five times faster really? by rackserverdeals · 2009-03-28 06:07 · Score: 1

Other languages don't require as much boilerplate to begin with, and thus benefit less from an IDE.
Completing boilerplate code for me isn't the reason I value a good IDE. Years ago I had written macros in a simple text editor that would generate accessor methods and even if I just typed them by hand it doesn't take much. This is probably 80% of the boilerplate code Some people just make their fields publicly available but I like sticking with the bean design pattern.
What I use most in an IDE is the integrated debugger and version control. I find it much faster to have all that under one roof. If you're not using an interpreted language the build system is very nice. What I like about Netbeans is that it uses ant and if I need to make a small change I can fire up a text file then run ant from the command line.
Netbeans profiler is very good too but I haven't had much need for it yet.
IF I'm doing web development, tomcat or glassfish runs within and is controlled by the ide. Saves a step and the hassle of configuring an external server. The server output is right in the IDE so I can trace log messages or errors.

You write your programs in a number of iterations, each adding some features or fixing some bugs, and then you test to see if it works as you intended. Compile time can be a significant part of this process
I can't think of one build system that doesn't do incremental builds. With Netbeans and Eclipse, they compile on save. In Netbeans, you can enable deploy on save. If I make a change to a class file, the changes are deployed to the running app instance before I can even alt-tab to the browser.
Cleaning and building a project is also pretty fast. For me, the slowest part of the whole process is FTPing the new archive.

This causes small, short, and linear (not a lot of loops or repetition) to underperform (e.g. you wouldn't want this behavior for Unix commands).
The JIT compiler is pretty efficient and gets up to speed pretty fast.
It does take a couple of seconds for the VM to load but it's gotten much better. It's not as fast for making command line programs where you want to see instant output, but for long running applications such as servers, it's great.
And let's be honest. Few people are making a living these days writing command line programs.

--
Dual Opteron < $600
Re:How fast is five times faster really? by rackserverdeals · 2009-03-28 07:01 · Score: 1

You're obviously not familiar with annotations and dependency injection in Java.
Here's a better side by side comparison of Java and Python.
Here's a Python developers view of some new Java features

--
Dual Opteron < $600
Re:How fast is five times faster really? by rackserverdeals · 2009-03-28 07:06 · Score: 1

That's interesting. I wouldn't know though. Since version 5.0 I switched to Netbeans exclusively.

--
Dual Opteron < $600
Re:How fast is five times faster really? by joss · 2009-03-28 08:33 · Score: 1

> I'll agree that 2x is in the ballpark, and I find that to be quite significant, considering that studies have found that developers tend to produce lines of (debugged, working) code at the same rate regardless of language.
It's not just that, a 2000 line program takes roughly 4 times as long to get right as a 1000 line program etc, so all else being equal, reducing line count by 50% can quadruple productivity.

--
http://rareformnewmedia.com/
Re:How fast is five times faster really? by ArsonSmith · 2009-03-28 09:51 · Score: 1

3x sir.

--
Paying taxes to buy civilization is like paying a hooker to buy love.

This is a very interesting project by Max+Romantschuk · 2009-03-27 08:36 · Score: 5, Interesting

I read about what they intend to do, and they seem to have quite a few interesting ideas... But there are also major drawbacks:

- No Windows support (apparently a Linux-only VM in the plans)
- No Python 3.0 support

And thus no guarantees most of the work will merge back into CPython.

But competition is good, I can't really see a problem with having an alternative faster Python runtime, even if it's not as compatible as CPython. :)

--
.: Max Romantschuk :: http://max.romantschuk.fi/

Re:This is a very interesting project by ianare · 2009-03-27 08:43 · Score: 2, Informative

- No Python 3.0 support
They are using v 2.6 which has been designated as the official migration step towards 3.0. So it should be easiy to port over to 3.0, anyway right now very few projects are using 3.0.
Re:This is a very interesting project by FishWithAHammer · 2009-03-27 08:43 · Score: 2, Interesting

I'm not quite sure what benefits this gives that Psyco doesn't already.

--
"You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
Re:This is a very interesting project by orclevegam · 2009-03-27 08:50 · Score: 2, Informative

- No Windows support (apparently a Linux-only VM in the plans)
The article says it's going to be based on LLVM which most definitely is cross-platform (and being touted as the logical successor to GCC). Unless they go out of their way to use some Linux only calls while implementing their Python VM on top of LLVM it should be trivially easy to get it running in Windows.

--
Curiosity was framed, Ignorance killed the cat.
Re:This is a very interesting project by MightyYar · 2009-03-27 08:50 · Score: 4, Informative

Psyco is x86 only and uses a lot of memory. It also requires additional coding... you have to actively use it, so you don't automatically get the speedup that a faster interpreter gets you. You also have to pick-and-choose what you want to get compiled with Psyco - the extra overhead isn't always worth it.
To be fair, I don't know what the memory requirements of this new project are.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:This is a very interesting project by FishWithAHammer · 2009-03-27 08:58 · Score: 1

Psyco may be x86-only, but this is Linux-only. That kills a lot of the appeal this might have in much the same way.

--
"You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
Re:This is a very interesting project by bnenning · 2009-03-27 09:00 · Score: 1

Psyco only works for 32-bit x86, and many Python features are unsupported.

--
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
Re:This is a very interesting project by maxume · 2009-03-27 09:00 · Score: 4, Informative

It might be easy to port over to 3.0, but not because it is using 2.6. Basically, they are planning on ripping out a big chunk of the internals of 2.6 and replacing it with a LLVM based system. To the extent that those internals changed for 3.0 (there wasn't necessarily effort put into making them compatible across 2.6 and 3.0...), the code would need to be updated for 3.0. The python level portability between 2.6 and 3.0 isn't a huge factor for something like this.
They are targeting 2.6 because that is what made sense for Google (who is paying for the work). Or so they say:
http://code.google.com/p/unladen-swallow/wiki/FAQ

--
Nerd rage is the funniest rage.
Re:This is a very interesting project by Tumbleweed · 2009-03-27 09:11 · Score: 4, Funny

I'm not quite sure what benefits this gives that Psyco doesn't already.
It doesn't get as stabby.
Re:This is a very interesting project by negative3 · 2009-03-27 09:26 · Score: 1

From what I've seen, Python 3.0 is not supported by a good number of Python packages whereas Python 2.6 is which would make the "no Python 3.0 support" a minor issue for me. Python 3.0 is also not shipping as the default interpreter for Fedora, Ubunutu, or openSuSE yet so it won't really affect basic users for a while. I have also seen benchmarks (but I don't have references, so I welcome contradictions and corrections) that show that 3.0 is considerably slower than 2.6 so if the speed of Python is an issue to people they shouldn't be using 3.0 (I take issue with people who grumble about Python's execution speed anyway - if speed is that important stick to C/C++). If you have a good amount of existing Python apps that work under 2.5 getting them to work in 2.6 isn't hard. Moving to 3.0 is a much bigger step, especially if you relied on built-in modules that are either different now or removed.
I see the "no windows support" as a much bigger negative - if one of the biggest strengths of Python is cross-platform support and you need your programs to work on both Windows and Linux (as I do) that's going to be a problem and I'm only half interested (because half of my apps never leave Linux).

--
"Physics is to math what sex is to masturbation." - Richard Feynman
Re:This is a very interesting project by Anonymous Coward · 2009-03-27 09:38 · Score: 1, Funny

- No Windows support (apparently a Linux-only VM in the plans)
They're trying to atone for their Chromium sins. You Windows lusers* will get a pre-alpha version ... eventually. The import statement won't work and every function call will print 'Stop! This VM isn't ready yet!' But you'll get something.
* And I say this without any animosity.
Re:This is a very interesting project by schmiddy · 2009-03-27 09:59 · Score: 1

Psyco is x86 only and uses a lot of memory

Even worse, Psyco is 32-bit only : Psyco does not support the 64-bit x86 architecture, unless you have a Python compiled in 32-bit compatibility mode. There are no plans to port Psyco to 64-bit architectures. This
However , as far as "requires addition coding", I think you're a little off-base.. unless you consider "import psyco" to be a lot of work.

--
http://cltracker.net -- powerful craigslist multi-city search
Re:This is a very interesting project by samkass · 2009-03-27 11:05 · Score: 1

Now that JDK7 is adding invokedynamic, it would be interesting to see this target the JVM instead of LLVM. The JVM is ported everywhere and is extremely fast. I smell some upcoming bake-offs...

--
E pluribus unum
Re:This is a very interesting project by colinrichardday · 2009-03-27 11:56 · Score: 1

You can still have Janet Leigh over for a shower.
Re:This is a very interesting project by ishobo · 2009-03-27 13:06 · Score: 1

Except the JDK uses GPL while LLVM uses a modifed BSD license (hence why a few project are hoping to replace gcc with clang). Lack of reciprocity is the key if this is intended to be imported into CPython.

--
Slashdot - The great and glorious cluster fuck of Internet wisdom.
Re:This is a very interesting project by MightyYar · 2009-03-27 13:52 · Score: 1

However , as far as "requires addition coding", I think you're a little off-base.. unless you consider "import psyco" to be a lot of work.

Don't get me wrong, it isn't hard to use - but it is greater effort than the zero it takes to run in a faster "interpreter". Also, if all you do is turn it on, you might hurt yourself in some areas with infrequently run code where the compile time exceeds the interpreted execution time. Psyco also brings up issues with portability.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:This is a very interesting project by MightyYar · 2009-03-27 13:57 · Score: 4, Informative

I think it's only Linux-only right now, because the developers currently use Linux. But they consider loss of Windows support a "risk", not a design goal:

Windows support: CPython currently has good Windows support, and we'll have to maintain that in order for our patches to be merged into mainline. Since none of the Unladen Swallow engineers have any/much Windows experience or even Windows machines, keeping Windows support at an acceptable level may slow down our forward progress or force us to disable some performance-beneficial code on Windows. Community contributions may be able to help with this.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:This is a very interesting project by idlemachine · 2009-03-27 14:24 · Score: 1

You also have to pick-and-choose what you want to get compiled with Psyco - the extra overhead isn't always worth it.
This is something that concerns me about the Unladen Swallow approach. The project site talks openly about stealing extensively from Psyco, so wouldn't the same concerns of having to selectively choose when & where to use it also apply to Swallow? Unless they plan on exposing some kind of control over the process...
Re:This is a very interesting project by gzipped_tar · 2009-03-27 16:09 · Score: 1

It's not that easy like "import psyco". You have to choose which part of your program to be run under it. For example, compiling regular expressions under psyco is actually slower.

--
Colorless green Cthulhu waits dreaming furiously.
Re:This is a very interesting project by FishWithAHammer · 2009-03-27 17:41 · Score: 1

Problem is, then you can't make any real assumptions about specs, etc. for your code. If it's the same gear across the different platforms, you can stipulate the new shiny Python environment as a requirement. As it is, now you're balancing different specs on different platforms, which can really suck.

--
"You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
Re:This is a very interesting project by the_womble · 2009-03-27 18:53 · Score: 1

It is not Linux only. The Getting started page currently has build instructions for MacOS, so I imagine it will, at the very least run on anything Unix like.
Also, their motive is to speed up Google's Python apps, which run on Linux. CPython is fast enough on desktops for most users. I would guess that the majority of Python on servers is on Unix like OSes (is there any evidence to the contrary).

Re:Unladen Swallow by davester666 · 2009-03-27 08:38 · Score: 5, Funny

It would still be huge! :-)

--
Sleep your way to a whiter smile...date a dentist!

Is that an African or European swallow? by mamono · 2009-03-27 08:39 · Score: 1

While you're at it, what is the capitol of Assyria?

No windows by nurb432 · 2009-03-27 08:49 · Score: 1

Or BSD, or several other important platforms.

--
---- Booth was a patriot ----

Re:No windows by Anonymous Coward · 2009-03-27 09:03 · Score: 4, Informative

Quite to the contrary, the FreeBSD guys have been building with clang+llvm for a while now, and they seem to like it. The kernel boots, init inits, filesystems mount, the shell runs.
What other platforms, Darwin? Apple employs the largest number of LLVM developers. Windows? Both MinGW and Visual Studio based builds are tested for each release.
It's still not as portable as the python interpreter, but that will come if and when developers who are interested in working on it start to contribute.

It's probably pining for the fiords. by smcdow · 2009-03-27 08:49 · Score: 1

FTFA:

Adopting LLVM could also potentially open the door for more seamlessly integrating other languages with Python code, because the underlying LLVM intermediate representation is largely language-neutral.

So much for Parrot.

--
In the course of every project, it will become necessary to shoot the scientists and begin production.

Re:It's probably pining for the fiords. by Abcd1234 · 2009-03-27 09:03 · Score: 4, Informative

Not really. Parrot is a much higher-level VM, providing things like closures, multiple dispatch, garbage collection, infrastructure to support multiple object models, and so forth, whereas LLVM really models a basic RISC instruction set with an infinite number of write-only registers.
In fact, it would make a fair bit of sense to actually use LLVM as the JIT-compiling backend for Parrot...
Re:It's probably pining for the fiords. by koiransuklaa · 2009-03-27 09:09 · Score: 1, Funny

So much for Parrot.
No no, he's not dead, he's... he's resting! Remarkable bird, the Norwegian Blue, ay?
Re:It's probably pining for the fiords. by chromatic · 2009-03-27 09:57 · Score: 1

In fact, it would make a fair bit of sense to actually use LLVM as the JIT-compiling backend for Parrot...

You'd almost wonder if Parrot developers were working on something like that....

--
how to invest, a novice's guide
Re:It's probably pining for the fiords. by jd · 2009-03-27 09:59 · Score: 2, Funny

The Parrot Sketch backfired not that long ago when fossils of a parrot (that probably was blue) were found in Norway. Not too far from the Fjords, as I recall. It is, however, quite dead.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:It's probably pining for the fiords. by ChrisDolan · 2009-03-27 15:07 · Score: 1

Google points out that several people have explored opposite idea: LLVM emitting Parrot bytecode. So, you could compile C down to Parrot for the ultimate in interoperability and portability. :-)

slowed it down by half? by Anonymous Coward · 2009-03-27 08:50 · Score: 5, Funny

0.5x slower is like 2x faster, right? Reciprocals?

Re:slowed it down by half? by Your.Master · 2009-03-27 09:15 · Score: 1

I interpreted it as "now the Python interpreter takes 150% as much time as it used to". The half being added, rather than multiplied.
Re:slowed it down by half? by jd · 2009-03-27 09:44 · Score: 1

Or it could mean they used half-and-half in the developer's tea, causing them to slow down.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:What about Parrot? by FishWithAHammer · 2009-03-27 09:01 · Score: 1

Parrot's a lot harder to use to interact with other languages. LLVM at least makes it possible for Python code to play nicely with C compiled via LLVM, for example.

--
"You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."

IronPython speed by icepick72 · 2009-03-27 09:03 · Score: 2, Informative

Word has it that Microsoft created a speedy IronPython implementation on their Common Language Runtime and JIT technology for .NET. Here are benchmarks for it. Failing to find similar benchmarks for comparison; can anybody else contribute to this info?...

Re:IronPython speed by kripkenstein · 2009-03-27 18:47 · Score: 1

First, IronPython isn't 100% compatible with CPython (neither is Jython, for that matter).

Second, speed-wise, it depends on the benchmark, but overall performance isn't much better. However, it does have good threading support - no GIL - which is a plus.

Too many levels of translation? by Theovon · 2009-03-27 09:04 · Score: 2, Interesting

It sounds like that they're going to take Python, which is already gets translated to some kind of p-code (right?) and either translate the original Python or the p-code into LLVM code, which is then JIT-compiled to the native architecture.

The translation from Python to LLVM is going to lose some specificity and require that extra code be added to implement whatever needs to be done in Python that isn't trivially implemented by LLVM. Then the LLVM code needs to be compiled to native, introducing yet more "glue" code in the process.

Wouldn't a more direct compile yield a better result?

And don't give me any junk about compiling dynamic languages. LISP and Self are highly dynamic languages, yet they're compiled. If they can be compiled, then so can Python. I mean, the fact that it can be done through multiple levels of translation proves that it can be done, although possibly inefficiently. I just think that a more direct approach would reduce some of the superfluous glue code and a variety of other inefficiencies in translation that result from a loss of knowledge about what the original program was actually trying to implement.

Re:Too many levels of translation? by Abcd1234 · 2009-03-27 09:21 · Score: 4, Informative

Wouldn't a more direct compile yield a better result?
No, it wouldn't.
The entire point of LLVM is that it provides an easy-to-target machine (it's basically a RISC instruction set) that you can use as your intermediate representation (the p-code you described). You then use the LLVM backends to compile the IR down to machine code. And because of the way the IR is structured (for example, it has write-only registers, which makes certain classes of optimizations much easier), you can do a really good job of optimizing.
Basically, you "direct compile" to the LLVM IR, and then let LLVM take care of the details of generating the machine code. This gives you better abstraction (no more machine-specific code generation in Python itself), portability (to whatever LLVM targets), and you get all the sophisticated optimization that LLVM provides for free. That's a huge potential win.
Re:Too many levels of translation? by Estanislao+Mart�nez · 2009-03-27 10:46 · Score: 1

The translation from Python to LLVM is going to lose some specificity and require that extra code be added to implement whatever needs to be done in Python that isn't trivially implemented by LLVM. Then the LLVM code needs to be compiled to native, introducing yet more "glue" code in the process.
What do you mean by "lose specificity" here?
It's not clear to me that either the Python bytecode or the LLVM code is "more specific" than the others. Simply, one of them is higher level than the other; there will be many cases where the Python bytecode spells out "what to do," and the corresponding LLVM translation spells out "how to do it." This means that both of them will end up having information that the other one doesn't; the Python source bytecode will imply that some sequences of LLVM instructions "belong together" in ways that the LLVM doesn't represent.
This just means that some optimizations can be performed on one representation, but not the other. The Python bytecode will be susceptible to optimization that eliminate relatively large chunks of LLVM code; the LLVM code will be susceptible to peephole optimizations that span across Python opcode barriers. So to the extent that that extra "glue" code you mention does work that really needs to be done, inlining it into the translations allows the compiler to use the surrounding context to optimize the glue in ways that the interpreter cannot.

Wouldn't a more direct compile yield a better result?
What's a "direct" compile? Optimizing compilers use multiple levels of representation, because each level is suited to different kinds of optimizations. For example, common expression elimination is easier to do in the abstract syntax tree (which represents the structure of the source code at a very high level.); while peephole optimization is best done at a lower-level representation (because you're looking to eliminate things like redundant instruction sequences).

I just think that a more direct approach would reduce some of the superfluous glue code and a variety of other inefficiencies in translation that result from a loss of knowledge about what the original program was actually trying to implement.
If anything, excessively "direct" compilation produces suboptimal code. The optimizations that rely on knowledge of the details of a high-level representation should be done at a high-level representation.

--
Are you adequate?
Re:Too many levels of translation? by MtHuurne · 2009-03-27 13:59 · Score: 3, Informative

The Python object files are just a more convenient way to store the program compared to text files. No information is lost or glue is added in that first step.
LLVM is, like its name suggests, really low level. You should think of it as a kind of portable assembly. It's much closer to actual hardware architectures than for example Java byte code. I don't expect much overhead from the LLVM to native step. A while ago I ran some tests with C++ compiled by GCC directly to native and compiled by GCC to LLVM byte code and then by LLVM to native; sometimes one approach was faster and sometimes the other, but they were pretty close.
So that leaves the glue added in the Python object to LLVM step. I expect this to have a significant overhead, but I don't see it becoming a smaller overhead by going directly to native. The advantage of using LLVM is that you only have to write this step once, instead of once for each architecture.
With LLVM it is possible to compile parts of the interpreter to LLVM byte code in advance and then inline that into the program being JIT-compiled. That way, you can be sure that the JIT and the interpreter actually do the same thing. Apple did this for their OpenGL driver, there is a nice presentation (PDF) about it.
Re:Too many levels of translation? by julesh · 2009-03-27 23:28 · Score: 1

The entire point of LLVM is that it provides an easy-to-target machine (it's basically a RISC instruction set)
No it isn't. It's static-single-assignment, which is an entirely different architecture to RISC. LLVM instructions are basically nodes in a computation data flow graph; it's very similar to the intermediate representation used by gcc.
Re:Too many levels of translation? by mechsoph · 2009-03-28 04:11 · Score: 1

LISP and Self are highly dynamic languages, yet they're compiled. If they can be compiled, then so can Python.
The difference between Lisp and Python (aside from kludgey syntax) is that Lisp records are not hash-tables. Structures in Lisp are actually vectors, so a field lookup is one pointer dereference and an add over directly having the object reference (same as Java). I'm not sure how you'd do similar things in Python. Maybe caching the most recently accessed field at each access point would help.
Re:Too many levels of translation? by Abcd1234 · 2009-03-28 06:28 · Score: 1

It's static-single-assignment, which is an entirely different architecture to RISC.
Odd, then, that LLVM's very own documentation refers to LLVM's instruction as "an abstract RISC-like instruction set". Remember, RISC-like only describes the nature of the actual instructions (basic arithmetic, flow control, etc), not the style of the register set. LLVM's IR just happens to model a RISC-like instruction set with an infinite set of write-only registers. That, combined with the ability to specifiy "key higher-level information for effective analysis, including type information, explicit control flow graphs, and an explicit dataflow representation" yields an IR which makes it very easy to analyze data flow in the program, facilitating certain classes of optimizations.
Re:Too many levels of translation? by julesh · 2009-03-28 20:51 · Score: 1

Odd, then, that LLVM's very own documentation refers to LLVM's instruction as "an abstract RISC-like instruction set".
I suspect whoever wrote that documentation isn't familiar enough with processor architecture theory to know what RISC actually means.
Remember, RISC-like only describes the nature of the actual instructions (basic arithmetic, flow control, etc), not the style of the register set.
Actually, RISC is very much tied to the register set: it requires that all instructions access data from and store results in registers except for a small minority that are specifically designed for memory access. LLVM's instruction set is agnostic about whether the values it works with are in memory or a register, so is quite clearly not RISC. You can get much closer to reproducing the same sequence of instructions that LLVM uses to represent an operation on a CISC machine than you can on RISC because on a CISC architecture you won't need as many additional memory access instructions.

Binspam by Thelasko · 2009-03-27 09:09 · Score: 5, Funny

I get emails claiming to increase my python's performance all of the time, I just delete them.

--
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".

Re:Binspam by oldhack · 2009-03-27 09:26 · Score: 1

I get emails claiming to increase my python's performance all of the time, I just delete them.

Then why is your pants smoking?

--
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.

Any Hope? by Anonymous Coward · 2009-03-27 09:10 · Score: 2, Funny

Is there any hope that we will move away from these boutique programming languages and back to "real languages" that seriously consider size and performance?

I for one am completely sick and tired of 3Ghz multicore processor machines with gigabytes of RAM running like a 486. Languages like Python don;t help in the bloat arena and the scripting languages made out of frameworks on top of other scripting languages are just ludicrous!

Re:Any Hope? by /dev/trash · 2009-03-27 12:34 · Score: 2, Insightful

You are free to use C and Assembler where required.
Re:Any Hope? by julesh · 2009-03-27 23:35 · Score: 1

I'd like to see virtual machines like Java and CLR shifting more work to compile-time and link-time. I'm pretty sure they could get decent performance with static compilation + link-time escape analysis for eliminating heap allocations.
Both Java and CLR do run-time linking. This is essential for many of their features, and supporting it makes ahead-of-time escape analysis impossible; as soon as a class is loaded that wasn't considered at compile time the results become invalidated.
The overengineering of Java frameworks is equally a contributing factor.
Agreed with this one, and .NET isn't much better. Try looking at a dependency graph of basic Java classes some time, it's horrible. You can't write a small-footprint Java application because as soon as you refer to certain classes (System, for example), you're pulling in half the framework. AWT is horrible and a total resource pig. SWING is worse. And that's not even touching EJB.

Did know it was that bad by kramulous · 2009-03-27 09:12 · Score: 1

I do my best here not to offend, but I can see clearly now why I don't use Python.

I keep getting pressured by others to adopt it rather than my C or C++ but if they are touting a possible 5x increase, that means it was really, really slow to begin with. And how much further is there to go? I suspect it is not even worth benchmarking it yet.

Since all I mostly do is big matrix and vector work why would I use python? And no, scipy doesn't count as I can get MPI going pretty quickly.

Yes, I realise the right tool for the job argument.

--
.

Re:Did know it was that bad by zindorsky · 2009-03-27 09:19 · Score: 2, Insightful

Yes, I realise the right tool for the job argument.
Exactly. Most applications are not CPU bound. If yours is, then I don't know why others are trying to get you to use Python.

--
If the geiger counter does not click, the coffee, she is not thick.
Re:Did know it was that bad by maxume · 2009-03-27 09:28 · Score: 1

You should use it if you think it would make your life easier. Numpy/Scipy are both supposed to make doing matrix stuff faster (they are written in C) while providing a Python like syntax and the ability to use python for the more mundane parts of the program. If that doesn't look better to you than C and C++, you shouldn't change.
I guess the tool that you are used to can be better than the tool you don't know (this doesn't quite work for a hammer and a screwdriver, but dammit, programming languages and libraries aren't anywhere near that simple).

--
Nerd rage is the funniest rage.
Re:Did know it was that bad by portscan · 2009-03-27 11:25 · Score: 1

if all you are doing is linear algebra, why would you use C++ instead of fortran, which is still top dog in that area.
Re:Did know it was that bad by kramulous · 2009-03-27 11:51 · Score: 1

It's been on the agenda for a while. I just haven't learnt all the little tweaks yet. But you are right. It amazes me how often another white paper comes out with further compiler optimisations for fortran. 40 years of optimisations (at least!) have to make it the superior language.

--
.
Re:Did know it was that bad by daver00 · 2009-03-27 20:38 · Score: 3, Insightful

The thing about Python is you are replacing every lost hour in runtime with a day gained in development time. That is the point of Python. Numpy (formerly scipy I think) is mostly written in C anyway and provides fast n-dimensional array objects for vector and matrix operations, there are really only a few bottlenecks for maths/science purposes. Generally anything that is going to take a seriously long amount of time you would be doing in C over anything else anyway, what Python is is a viable alternative to Matlab etc, and a damn sight less expensive!
Where I study Engineering they teach Python for this very reason. It has a gentle syntax which appeals to engineers and scientists who often aren't bargaining to become coders, and it is so much cheaper than Matlab that any missing features are rendered a moot point.
Seriously, sitting on the sidelines and saying "I'm not gonna use Python because it is slow" is silly, it is so damn easy to code in python that you would learn it in a weekend if you already have coding experience. And as I said before, any lost time running python scripts over other languages is made up ten time over at least in the ridiculously short development times that go with Python scripts. Yes, it really is THAT easy to do anything in Python, there is a reason people bug you to try it out. Just give it a weekend, Python deserves it!
Re:Did know it was that bad by kramulous · 2009-03-27 23:47 · Score: 1

I dunno, I can write C and C++ pretty quick.
That's pretty much all I got to say about that :)

--
.
Re:Did know it was that bad by maxume · 2009-03-28 01:06 · Score: 1

Numpy adds (fast) support for multidimensional arrays to Python; Scipy provides a bunch of tools for working with Numpy arrays.

--
Nerd rage is the funniest rage.
Re:Did know it was that bad by daver00 · 2009-03-28 17:23 · Score: 1

I'm sure you can, but I can guarantee you that you will need to type dramatically less code in Python, and thus output is greatly increased no matter how fast you are. This is like saying you can pump out a 5000 word essay in the time it takes someone to do up a 100 word paragraph.
Re:Did know it was that bad by daver00 · 2009-03-28 17:29 · Score: 1

The thing about Python is that you are replacing every lost hour in coding time with an hour of debugging time.
...If you suck at Python.
Its not the same as other languages, so you keep things like automatic type handling in mind, and force types to be certain things where necessary. Its pretty fucking simple if you're not stupid. The fact is that Python is being widely adopted, rapidly, and for this exact reason: Speed is not the issue anymore, volume of output is far more important. Coders cost more than faster computers.

stackless by Tumbleweed · 2009-03-27 09:14 · Score: 1

So whatever happened to 'Stackless' Python? Is that ever going to be merged into CPython? And would it work with this?

what about pypy? by gilleain · 2009-03-27 09:24 · Score: 1

They (http://morepypy.blogspot.com/) have noticed the project, it seems.

We were a bit confused about usage of the term JIT, because as far as we understood, it's going to be upfront compilation into LLVM. In the past we have looked into LLVM - at one point PyPy extensively use it but it wasn't clear how we could make good use to it.

They seem a bit sceptical.

Re:Don't you mean by scorp1us · 2009-03-27 09:31 · Score: 1

Mod up

--
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.

Re:What about Parrot? by Abcd1234 · 2009-03-27 09:32 · Score: 1

Parrot's a lot harder to use to interact with other languages.

Uhh... wha? That's one of the entire reasons Parrot exists. Any language that's compiled to Parrot can interact with any other language compiled to Parrot.

LLVM at least makes it possible for Python code to play nicely with C compiled via LLVM, for example.

Huh? I *really* don't see how LLVM provides a mechanism for languages to interact with one another. It's IR is really just machine code, it's just that the machine doesn't actually exist. In that sense, compiling to LLVM IR is absolutely no different than compiling directly to, say, x86, and it's pretty clear that Perl, compiled to x86, can't interact with Python, compiled to x86, so why would that be any different for Perl compiled to LLVM IR and Python compiled to LLVM IR?

Remember, language interaction requires a whole host of things, including a common underlying framework for how objects are represented, how methods are called, etc. As far as I know, LLVM provides none of that (unlike the JVM, CLR and Parrot). Heck, it only offers a few types of primitives, including basic numbers, pointers, and lists. It has no concept of objects at all... so how is a Python object supports to interact with a Perl object, for example?

That said, you could certainly build something like that *on top* of LLVM (eg, a CLR, JVM, or Parrot backend that compiled down to LLVM IR, which then provides the necessary infrastructure for languages to interact), but LLVM itself does not, as far as I can tell, directly facilitate such a thing.

It all depends by mkcmkc · 2009-03-27 09:38 · Score: 5, Insightful

I find Python is about 20x slower (and about 10x faster to implement) than C, with the number varying quite a bit depending on how CPU-bound the code is. Given the speed of modern processors, this is plenty fast for many tasks.

Beyond that, many Python programmers employ a strategy of writing just the CPU-intensive inner loops in C or C++. This gives you most of the speed of an all-compiled solution but with much of the easier programming (and shorter programs) of the all-Python approach.

My particular scientific application runs on 1500 cores, is about 75% Python/25% C++, is 4-5x smaller than similar all-C/C++ programs, and runs at about 95-99.99% of the speed of an all C++ solution.

(Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)

Not all apps will fall out this way, but you definitely can't assume that just because something's written in Python that it will be slow.

(Going beyond that, we all know that better algorithms usually trump all of this anyway. If writing in Python gives you the time and clarity to be able to use an O(n)-better algorithm, that may pay off in itself.)

--
"Not an actor, but he plays one on TV."

Re:It all depends by master_p · 2009-03-27 10:39 · Score: 1, Flamebait

(Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)

I smell bullshit. There is no overhead from using STL containers.
If you used an std::list or an std::map for random access, then you certainly had a bottleneck, because those containers are not for random access.
If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.
Re:It all depends by tkinnun0 · 2009-03-27 11:25 · Score: 1

If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.
If you say so, but try to tell that to the compiler. Or rather, try to let the compiler figure that out.
Re:It all depends by mkcmkc · 2009-03-27 11:55 · Score: 3, Informative

I smell bullshit. There is no overhead from using STL containers.
If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.
That was my impression, too, but careful timing and profiling suggested otherwise.
In addition, we can by simple reasoning determine that there's gotta be some overhead involved with vector implementations. First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes. Also, I can have a pointer to a vector, and that vector can grow arbitrarily without invalidating the pointer. That means that there pretty much has to be an indirect pointer to the vector's storage. It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down. ("more or less" because one can imagine certain optimizations that might be possible if you somehow knew an upper bound on the vector's lifetime size)
All of this stuff costs you in time and space.
Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound (maybe even small enough to be cache-beneficial). I can allocate that array in the parent function and pass in a pointer each time. Overhead to create and destroy the array in the inner function each time: zero. If you do this with a vector, the implementation has to zero the length, which costs time. Or you can delete and recreate it by letting it go out of scope, but that also costs time.
Most of the time these minor effects don't matter, but if it's in the innermost loop and is going to run billions of times, it can be quite noticeable.
It could conceivably be that gcc's implementation of STL is a little slow. Doesn't matter why, though, because that's my target, and that's where my program has to run.
It's been a while since I went through this exercise, so I don't have the exact scenario. But the code is GPL'ed and available here. If you can replace any of the arrays with an as-simple, as-fast use of vectors, I'd be happy to have it.

--
"Not an actor, but he plays one on TV."
Re:It all depends by kramulous · 2009-03-27 11:58 · Score: 1

No bullshit ... at least in my case. std::vector does a lot of array bounds checking and various other things that involve 'if' statements. You don't want them inside large loops. So I write my own vector classes - I make assumptions.
Now, I used to have some kickarse vector (and matrix) templates but have had to ditch them with the release of the Harpertown processor because templates don't vectorise. This was ok for the Clovertown but the 256 bit wide register in the Harpertown (as opposed to the 128bit for Clovertown) made my template tricks redundant.
It's getting hard to program for a single core - to get maximum work done.

--
.
Re:It all depends by Garse+Janacek · 2009-03-27 12:04 · Score: 1

I smell bullshit. There is no overhead from using STL containers.
Ehm... that's a nice theory, but I second GP's experience in finding otherwise in practice. I don't know the specific reasons -- maybe there were memory fragmentation issues and it wasn't really STL's "fault" -- but I was doing some large-for-my-laptop (with 3GB ram) data processing, and initially used vectors for everything. I eventually had to give up and rewrite it all with arrays just like GP, because I kept having difficult-to-debug and impossible-to-fix memory issues as a result.
Really, why would someone bother "bullshitting" about something like this? He was pointing out a peculiarity of the overheads one particular program he worked with, I don't think he was doing it with some sort of anti-STL agenda or anything...

If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.
There might be systems on which this is true, but not on the c++ libraries on my mac. When I was trying to figure out what was going wrong with my vector-based program, I got to look at a lot of vectors from within gdb, and they have a neat bucket system going on that I'm sure is very fancy and clever, but let me tell you, it is not just an array, and good luck figuring out from the data alone what is stored in it unless you already know an awful lot about the underlying implementation...

--
I am the man with no sig!
Re:It all depends by Azverkan · 2009-03-27 19:33 · Score: 1

I've run into the problem of terrible STL performance in more than one occasion. The problem isn't so much that's impossible for it to be as efficient as C, in theory it could be. But to date compilers are more likely to turn a pointer simple assignment like tail->next=new into a single instruction store than pages upon pages of accessors calling accessors to call into templated functions in order to do the same.
If your idea of high performance is that both operations need to take less than a second, then you won't notice STL vs C performance differences. But if you need to guarantee that all of your array / vector / list operations complete within a few hundred processor cycles, optimizing STL is typically more work than replacing the entire thing with C code (including the extra debugging that doing the code in C will likely entail).
One of the things related to GCC performance is that you can use the LLVM C backend to generate C code from the C++ STL template source code. If you feed the C code back into GCC you can see huge speed gains with the compiler optimizing loops significantly better than if you feed it the C++ code directly. The downside is that debugging the LLVM generated obfuscated C code is a royal pain.
Re:It all depends by windwalkr · 2009-03-27 19:56 · Score: 1

While I won't comment on the performance of any particular implementation of the STL, or any particular C++ optimizing compiler, many of your points don't ring true:

That was my impression, too, but careful timing and profiling suggested otherwise.
In addition, we can by simple reasoning determine that there's gotta be some overhead involved with vector implementations. First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes.
This is just as true of a C array; if you're resizing it, then you'll need to track its size manually. If you're never resizing it, then it's unfair to compare against a vector that you're resizing - just resize the vector (once) to the size you want, and you'll never again pay a penalty for updating the size member.

Also, I can have a pointer to a vector, and that vector can grow arbitrarily without invalidating the pointer. That means that there pretty much has to be an indirect pointer to the vector's storage.
Correct, it's more similar to a pointer (which in C is syntactically similar to an array) rather than an actual array.

It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down.
I'm uncertain how you think that this might be the case. Being on the heap is not a performance penalty in any real sense that I can think of.

Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound (maybe even small enough to be cache-beneficial). I can allocate that array in the parent function and pass in a pointer each time. Overhead to create and destroy the array in the inner function each time: zero. If you do this with a vector, the implementation has to zero the length, which costs time. Or you can delete and recreate it by letting it go out of scope, but that also costs time.
I think you're confusing the data type for the algorithm. If you're not going to zero your C-style array, then one would have to assume that you don't need to zero the vector either. In which case, you're not going to see a real performance difference between the two approaches.
I can certainly agree that it's possible to write inefficient code with the STL, however that comes down to what you're asking the STL to do. Certainly if you ask it to do more work than you were asking of your C-style array solution, the STL will come out slower - if you construct more often, clear more often, copy more often, introduces more counters, etc. - yes, it has the potential to be slower. However, on the exact same workload, you're going to find that the overheads in STL are exceedingly small.
Oh, and one other thing - don't profile in debug mode. Many STL implementations include extensive debugging support which bumps up the cost of even simple operations in a debug build. Some even do this in release mode unless you explicitly disable it.
Re:It all depends by kushal_kumaran · 2009-03-27 22:46 · Score: 1

If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.
There might be systems on which this is true, but not on the c++ libraries on my mac. When I was trying to figure out what was going wrong with my vector-based program, I got to look at a lot of vectors from within gdb, and they have a neat bucket system going on that I'm sure is very fancy and clever, but let me tell you, it is not just an array, and good luck figuring out from the data alone what is stored in it unless you already know an awful lot about the underlying implementation...
The storage for a std::vector is required by the C++ standard to be an array. See http://www.parashift.com/c++-faq-lite/containers.html#faq-34.3
Re:It all depends by master_p · 2009-03-28 00:29 · Score: 1

More bullshit.
"First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes."
If you use plain C arrays and you grow them manually, you will have to keep the size around yourself. So, you are only duplicating std::vector's work.
"Also, I can have a pointer to a vector, and that vector can grow arbitrarily without invalidating the pointer. That means that there pretty much has to be an indirect pointer to the vector's storage."
You would do this even if you did use a plain C array which you would reallocate in order to enlarge it.
"It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down. ("more or less" because one can imagine certain optimizations that might be possible if you somehow knew an upper bound on the vector's lifetime size)"
If you use expandable arrays, then the storage is coming from the heap anyway. If you don't use expandable arrays, and you only need a fixed size array, then you wouldn't use STL in the first place.
"All of this stuff costs you in time and space"
But you can not avoid them if you want your arrays to grow in time. In any case, you simply dupe std::vector's work.
"Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound (maybe even small enough to be cache-beneficial). I can allocate that array in the parent function and pass in a pointer each time. Overhead to create and destroy the array in the inner function each time: zero. If you do this with a vector, the implementation has to zero the length, which costs time. Or you can delete and recreate it by letting it go out of scope, but that also costs time"
Obviously, you don't know STL good enough. With an std::vector, you set its size to the upper bound, and then use it as a standard C array, passing it to the child function. You don't need to zero its length each time you pass it to the child function.
"It could conceivably be that gcc's implementation of STL is a little slow. Doesn't matter why, though, because that's my target, and that's where my program has to run."
Well, if it is, don't blame the STL generally, blame GCC or the specific environment you work with.
"]It's been a while since I went through this exercise, so I don't have the exact scenario. But the code is GPL'ed and available here. If you can replace any of the arrays with an as-simple, as-fast use of vectors, I'd be happy to have it."
I am not going to do your work. You just simply have to replace your arrays with vectors. It's very simple.
Re:It all depends by master_p · 2009-03-28 00:32 · Score: 1

Even more bullshit.
"No bullshit ... at least in my case. std::vector does a lot of array bounds checking"
Std::vector::operator [] does not do bounds checking.
"and various other things that involve 'if' statements. You don't want them inside large loops. So I write my own vector classes - I make assumptions."
What other ifs? be more specific.
"Now, I used to have some kickarse vector (and matrix) templates but have had to ditch them with the release of the Harpertown processor because templates don't vectorise. This was ok for the Clovertown but the 256 bit wide register in the Harpertown (as opposed to the 128bit for Clovertown) made my template tricks redundant."
Why? if you use C arrays with the Harpertown, you can use std::vectors. They are interchangeable. Having access to the first element of the array, you can take a pointer to it and use it as a C array.
Re:It all depends by master_p · 2009-03-28 00:35 · Score: 1

The stream of bullshit never ends.
"Ehm... that's a nice theory, but I second GP's experience in finding otherwise in practice. I don't know the specific reasons -- maybe there were memory fragmentation issues and it wasn't really STL's "fault" -- but I was doing some large-for-my-laptop (with 3GB ram) data processing, and initially used vectors for everything. I eventually had to give up and rewrite it all with arrays just like GP, because I kept having difficult-to-debug and impossible-to-fix memory issues as a result.
Really, why would someone bother "bullshitting" about something like this? He was pointing out a peculiarity of the overheads one particular program he worked with, I don't think he was doing it with some sort of anti-STL agenda or anything..."
No one in the above posts said anything remotely realistic about vectors. They even said that "std::vector does bounds checking". For God's shake, this is so untrue! the STL documentation says it with big bold letters that the operator [] does not do any bounds checking!!!!
"There might be systems on which this is true, but not on the c++ libraries on my mac. When I was trying to figure out what was going wrong with my vector-based program, I got to look at a lot of vectors from within gdb, and they have a neat bucket system going on that I'm sure is very fancy and clever, but let me tell you, it is not just an array, and good luck figuring out from the data alone what is stored in it unless you already know an awful lot about the underlying implementation..."
Bucket system? are you sure you used an std::vector? because buckets may be used in std::deque or std::tr1::unordered_set/map.
If your C++ environment has an std::vector implemented with buckets, then blame your specific C++ environment that violates the standard.
Re:It all depends by kramulous · 2009-03-28 01:31 · Score: 1

I just had to test, tell me why:
int n = 100000000;
std::vector inta;
std::vector intb;
for (int i = 0; i lessthan n; i++)
{
inta.push_back(i);
intb.push_back(i);
}
for (int i = 0; i lessthan n; i++)
inta[i] = 6.4*intb[i] + 9.0;
Takes 8 times longer to execute than
int n = 100000000;
float *test = new float[n];
float *tests = new float[n];
for (int i = 0; icat /proc/cpuinfo
model name : Genuine Intel(R) CPU T2400 @ 1.83GHz
Intel compiler (11.0.081)
Because I was curious, I did the y = ax +b loop 100 times on
model name : Intel(R) Xeon(R) CPU E5462 @ 2.80GHz
And the std::vector:
taskset -c 0 time ./Go
26.33user 1.13system 0:27.46elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+457736minor)pagefaults 0swaps
Simple Array:
taskset -c 0 time ./Go
0.10user 0.23system 0:00.33elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+97929minor)pagefaults 0swaps
(I'm the only person on the node in question)
I think the winner is pretty clear. Maybe not for the reasons why I specified, but something is happening with the std::vector that is unacceptable for me.

--
.
Re:It all depends by kramulous · 2009-03-28 01:34 · Score: 1

That second code fragment
float *test = new float[n];
float *tests = new float[n];
for (int i = 0; i lessthan n; i++)
{
test[i] = i;
tests[i] = i;
}
for (int i = 0; i lessthan n; i++)
test[i] = 6.4*tests[i] + 9.0;

--
.
Re:It all depends by joib · 2009-03-28 02:36 · Score: 1

Your benchmark numbers are obviously incorrect. This is because your compiler can see through the simple array test and sees that it does nothing, hence optimizes the entire thing away. A common problem in benchmarking.
As for the code itself, you're comparing apples to oranges. You start with an std::vector of size 0, and grow it over time as you push_back() stuff into it. The equivalent simple array implementation would be to start with a 1 element array, and realloc() to twice the size when needed. So obviously the vector implementation has to do a lot of realloc() + copy operations. You can get around this with the vector.reserve() function, which allows you to allocate enough memory up front. Or alternatively, there is the resize() function allowing you to, well, resize the vector to the right size and write to the elements directly with inta[i] etc.
In this case, neither of these will get quite the same performance as the C array; the reserve() method suffers from having to update the size as you push_back(), and resize() suffers due to STL specifying that container should start zero initialized. But the difference is pretty small; on my laptop the vector test takes a factor of 1.4 longer than the simple array. In a real application, it's likely the difference will be negligible.
Re:It all depends by dkf · 2009-03-28 02:44 · Score: 1

The stream of bullshit never ends.
From the way you're going on about it, I presume you must be a member of the committee that designed the STL?

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:It all depends by mkcmkc · 2009-03-28 05:20 · Score: 1

First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes.
This is just as true of a C array; if you're resizing it, then you'll need to track its size manually. If you're never resizing it, then it's unfair to compare against a vector that you're resizing - just resize the vector (once) to the size you want, and you'll never again pay a penalty for updating the size member.
Hmm--yes, that sounds right. Still, though, if I have O(n) arrays all the same size, C arrays can remember that in O(1) space, whereas STL would apparently need O(n) space (storing the size within each vector).

It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down.
Being on the heap is not a performance penalty in any real sense that I can think of.
I was thinking of the time it takes to malloc and free the storage, which would (I would think) be much larger than the time involved in allocating on the stack, or perhaps out of static storage, in cases where this is possible.

Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound...
I think you're confusing the data type for the algorithm. If you're not going to zero your C-style array, then one would have to assume that you don't need to zero the vector either. In which case, you're not going to see a real performance difference between the two approaches.
Yes, okay. You're saying that if you forgo all of the vector-ish features of a vector, just allocate it initially and then index into it, there should be no additional relative cost. That does sound true. Although maybe there's still a bit for translating between "pointer to vector" and "pointer to vector's actual array". Plus it seems like I can't really run memset on a vector's storage, which may cost me.

Certainly if you ask it to do more work than you were asking of your C-style array solution, the STL will come out slower - if you construct more often, clear more often, copy more often, introduces more counters, etc. - yes, it has the potential to be slower. However, on the exact same workload, you're going to find that the overheads in STL are exceedingly small.
One of the problems with C++/STL is that it's much more difficult to see when you are implicitly asking more work to be done. It's certainly possible that I was shooting myself in the foot. On the other hand, if you're involving vectors in an operation that must be run billions of times, any overhead at all can be a showstopper for STL use.
If you'd asked me before I got into this project, I would have guessed that STL data structures would usually be more efficient than hand-rolled C. Now I'm much less sanguine about the whole thing.

Oh, and one other thing - don't profile in debug mode. Many STL implementations include extensive debugging support which bumps up the cost of even simple operations in a debug build. Some even do this in release mode unless you explicitly disable it.
Hmm. This would have been g++'s implementation of STL in the default mode (no flags or environment variables). I didn't see any "make it faster" options mentioned in the docs.

--
"Not an actor, but he plays one on TV."
Re:It all depends by mkcmkc · 2009-03-28 05:26 · Score: 1

I am not going to do your work. You just simply have to replace your arrays with vectors. It's very simple.
You'll have to forgive me, but for the time being I'm afraid I'll have to continue to believe my own lying eyes.
(I'm sure a lot of us would appreciate a link to your resume, however.)

--
"Not an actor, but he plays one on TV."
Re:It all depends by Garse+Janacek · 2009-03-28 09:06 · Score: 1

If your C++ environment has an std::vector implemented with buckets, then blame your specific C++ environment that violates the standard.
I wasn't blaming anyone or anything in particular. Just relating my experience, which agreed with the original poster. You're saying something is impossible, but it happened to both of us. If you think I should blame my "specific C++ environment", well that's fine with me, but I'm not sure what you're trying to defend right now, I'm not trying to attack anyone or anything, I just don't think that the original poster was necessarily bullshitting about their experience.

The stream of bullshit never ends.
Just because you are confident of your own correctness doesn't mean everyone else is lying about their experience. You can suggest alternate explanations for those experiences without insulting everybody and saying everyone who has seen this issue is bullshitting (and again... why would we be bullshitting about this? What could there possibly be to gain by tricking people about something like that?)
Or, more succinctly: Just because you think you're right, doesn't mean you have to be a jerk.

--
I am the man with no sig!
Re:It all depends by kramulous · 2009-03-28 09:30 · Score: 1

Hey, thanks for the reply.
I suspected the realloc problem hence running 100 times
j = 0; j lessthan 100; j++
i = 0; i lessthan n; n++
But I made the malloc instead of looping a realloc and I still get
icpc test.cpp -o Go
test.cpp(14): (col. 10) remark: LOOP WAS VECTORIZED. /usr/include/c++/4.1.2/bits/vector.tcc(330): (col. 5) remark: LOOP WAS VECTORIZED. /usr/include/c++/4.1.2/bits/vector.tcc(334): (col. 5) remark: LOOP WAS VECTORIZED. /usr/include/c++/4.1.2/bits/vector.tcc(343): (col. 5) remark: LOOP WAS VECTORIZED. /usr/include/c++/4.1.2/bits/vector.tcc(365): (col. 5) remark: LOOP WAS VECTORIZED.
me@cl1n024:~/bin/pbstest/Interactive> taskset -c 0 time ./Go
25.95user 0.51system 0:26.46elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+195592minor)pagefaults 0swaps
"FLAGS = -O2" by default

--
.
Re:It all depends by mkcmkc · 2009-03-28 15:58 · Score: 1

Why? if you use C arrays with the Harpertown, you can use std::vectors. They are interchangeable. Having access to the first element of the array, you can take a pointer to it and use it as a C array.
You're saying that I can do this (at no cost)?
std::vector a = ...;
int *p = &a[0];
I doubt this is legal, and even if it is I'd say the cure is worse than the disease...

--
"Not an actor, but he plays one on TV."
Re:It all depends by windwalkr · 2009-03-28 20:09 · Score: 1

Hmm--yes, that sounds right. Still, though, if I have O(n) arrays all the same size, C arrays can remember that in O(1) space, whereas STL would apparently need O(n) space (storing the size within each vector).
Quite so. I would contend that this is not a problem for a normal workload (you're probably looking at 8 bytes overhead per vector as compared to a C array, plus maybe a small amount of overhead from the malloc library) but a pathological case could hit this (ie. a scenario where you have truly massive numbers of near-empty vectors.) My point here is not that a vector has identical performance characteristics to a C array, but that in any reasonable usage the real-world performance will be equivalent.

I was thinking of the time it takes to malloc and free the storage, which would (I would think) be much larger than the time involved in allocating on the stack, or perhaps out of static storage, in cases where this is possible.
Again, heavily dependent on usage rather than choice of array types. If allocation and deallocation incurs a large cost in your code, then obviously you're going to want to optimize for this specific concern. The first step would be to preallocate wherever possible, regardless of whether you are using a vector or a C array. Also of note - if you're really able to keep your arrays on the stack, then you don't have enough of them around that the 8-bytes overhead above will be noticeable, unless you're talking about a heavily multi-dimensional structure. If you're working with something multi-dimensional, then fair enough - vector may not be the best choice for you.

Yes, okay. You're saying that if you forgo all of the vector-ish features of a vector, just allocate it initially and then index into it, there should be no additional relative cost. That does sound true.
I'm saying that you have options, and that some options lean toward high performance, and some options lean towards flexibility or other benefits at the detriment of performance. The choice of using a vector does not inherently force you to sacrifice performance- you can choose what techniques are useful to you and which would be too expensive for your particular case.

Although maybe there's still a bit for translating between "pointer to vector" and "pointer to vector's actual array".
This is heavily dependent on your code, your compiler, and your data structures. It's feasible that a vector could be marginally slower, but I would not make that assumption until I'd looked at the assembly for your specific case since the optimizer can probably get rid of it in most cases. "Marginally" here probably means a single machine instruction, and if you're really concerned with that level of performance then you should probably be working in assembly anyway.

Plus it seems like I can't really run memset on a vector's storage, which may cost me.
Of course you can.

One of the problems with C++/STL is that it's much more difficult to see when you are implicitly asking more work to be done.
I've heard that remark for the last 10 years and it only rings true if you don't program in C++. I think it's fair to say that if you don't understand the finer details of a particular language, you won't be able to write optimal code in that language. I'm not denigrating yourself or anyone else for not being able to write optimal code in a language which they don't use heavily, but to say that it's hard for an experienced user of the language is misleading at best.

On the other hand, if you're involving vectors in an operation that must be run billions of times, any overhead at all can be a showstopper for STL use.
Any per-operation overhead which is comparable to the cost of the operation could certainly be considered a show-stopper. If your operations are large, or if the
Re:It all depends by mkcmkc · 2009-03-29 03:09 · Score: 1

Plus it seems like I can't really run memset on a vector's storage, which may cost me.
Of course you can.
I accept most of what you're saying, but this has got to be wrong. The only way this works is if the standard specifies that vector elements must be stored in a single, contiguous block of memory. Is this really the case?
Over and above that, what if the elements are objects? Sounds evil to me.

One of the problems with C++/STL is that it's much more difficult to see when you are implicitly asking more work to be done.
I've heard that remark for the last 10 years and it only rings true if you don't program in C++.
Well, judge for yourself if I'm any good, but I do program in C++, even though these days I try to avoid it.
I did at one time have a line-by-line familiarity with the ANSI C standard. My sense is that C++ is an order of magnitude more complex, and I've never personally met anyone that I felt had mastery of the language. I'm sure that such people exist, but the fact that I don't encounter them suggests to me that they are relatively rare--and that using a lot of C++ on a project isn't a good idea.

If your operations are large, or if the overhead is not per-operation, then small overheads elsewhere may be completely harmless.
For these cases, though, I'm probably just using Python instead. Where I'm using C++, it pretty much has to be as fast as C. When I started my project, I thought that there wouldn't be any difficultly getting the C++ to be as fast as C, but after living with it for a while, I'm seriously considering going back to straight C (albeit as much for its relative simplicity as for the speed).

--
"Not an actor, but he plays one on TV."
Re:It all depends by windwalkr · 2009-03-29 15:12 · Score: 1

I accept most of what you're saying, but this has got to be wrong. The only way this works is if the standard specifies that vector elements must be stored in a single, contiguous block of memory. Is this really the case?
http://groups.google.com/group/comp.lang.c++/msg/408d058c256d7699

Over and above that, what if the elements are objects? Sounds evil to me.
Again, the choice of std::vector or C array has no bearing on whether memset() is going to work. If the elements are not POD, then you're going to have repercussions in either case. If they are POD, it's going to work fine in either case. If you're looking at non-POD objects, or just feel like doing thing "the C++ way", then STL offers reserve(), clear(), fill() and so on which can be used to do the above cleanly in all cases.

Well, judge for yourself if I'm any good, but I do program in C++, even though these days I try to avoid it.
I'm not judging your capability. Rather, I'm noting that familiarity with a particular language or usage pattern has a direct bearing on how well you're going to be able to write optimal code in that language.

My sense is that C++ is an order of magnitude more complex, and I've never personally met anyone that I felt had mastery of the language.
I know one or two who have mastered the whole language, and honestly, these guys aren't the best programmers i know. I know many who have mastered the areas that I personally feel are relevant in day-to-day programming including lower-level optimization, and these guys are good. I know many who have a light working knowledge of the language, can write and debug most application code, and who have probably never considered the type of stuff we're discussing here. All of these guys are good in their way, but if you don't do something regularly, you don't become proficient at it. Certainly, mastering the entire language is not necessary for the kind of optimization we're talking about here.

I'm sure that such people exist, but the fact that I don't encounter them suggests to me that they are relatively rare--and that using a lot of C++ on a project isn't a good idea.
I'd say this is industry specific. Certainly in my industry, anyone who can't do this kind of thing would be considered a junior. Again, this may be completely different in other industries, based on what is important in that specific industry.

For these cases, though, I'm probably just using Python instead. Where I'm using C++, it pretty much has to be as fast as C.
You may have missed my point. I'm not suggesting that C++ can't be as fast as C, but rather am referring to the usual 90%/10% rule when optimizing. There's little point in worrying about the 90% of the code/structure which takes 10% of the time, at least until you've dealt with the 10% of the code/structure which takes 90% of the time. If you're finding that STL is introducing overhead into your code's critical path then yes, you'll want to look at alternatives - either avoiding STL for this part of the code, or consider restructuring your structures so that this doesn't occur.

When I started my project, I thought that there wouldn't be any difficultly getting the C++ to be as fast as C, but after living with it for a while, I'm seriously considering going back to straight C (albeit as much for its relative simplicity as for the speed).
And that's fair enough. I'm not telling you how to program, and only you can determine what's the best option for you. My point is that just because the STL didn't work out for you personally, doesn't reflect on STL's capabilities in a general sense.
Re:It all depends by joib · 2009-03-29 18:07 · Score: 1

You didn't provide results from a fixed benchmark to compare with, so what conclusion am I supposed to draw from you results?
Anyway, in my case, if I run the second loop 100 times the difference between the simple array and vector tests vanish, as expected.
Re:It all depends by L33tGreg · 2009-03-30 05:57 · Score: 1

GCC's libstdc++6-4.3 std::vector size() is computed by subtracting the end-begin ptrs. There is no size field! Vector should be no slower than array, if it is, you are probably using it wrong. The only exception is if you constantly create arrays on the stack, those would be faster on most platforms than vector because constantly allocating vectors cause heap allocations. However, a pre-allocated vector and pre-allocated array are equal.
Re:It all depends by kramulous · 2009-03-30 11:31 · Score: 1

Hey, I've been having a little play and it (as per usual) is the compiler. Results are comparable for GNU compiler, Intel compiler is an entirely different matter (Intel compiler, -O2 by default).
me@cl1n024:~/bin/pbstest/Interactive> g++ --version
g++ (GCC) 4.1.2 20070115 (SUSE Linux)
me@cl1n024:~/bin/pbstest/Interactive> icpc --version
icpc (ICC) 11.0 20081105
RESULTS:
std::Vector
g++ -O2 test.cpp -o Go2
me@cl1n024:~/bin/pbstest/Interactive> taskset -c 3 time ./Go2
48.19user
0.48system
0:48.67elapsed
100%CPU
(0avgtext+0avgdata 0maxresident)
k0inputs+0outputs (0major+195578minor)pagefaults 0swaps
icpc test.cpp -o Go2
test.cpp(14): (col. 10) remark: LOOP WAS VECTORIZED. /usr/include/c++/4.1.2/bits/vector.tcc(330): (col. 5) remark: LOOP WAS VECTORIZED. /usr/include/c++/4.1.2/bits/vector.tcc(334): (col. 5) remark: LOOP WAS VECTORIZED. /usr/include/c++/4.1.2/bits/vector.tcc(343): (col. 5) remark: LOOP WAS VECTORIZED. /usr/include/c++/4.1.2/bits/vector.tcc(365): (col. 5) remark: LOOP WAS VECTORIZED.
me@cl1n024:~/bin/pbstest/Interactive> taskset -c 3 time ./Go2
26.05user
0.44system
0:26.50elapsed
100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+195591minor)pagefaults 0swaps
Normal Array
g++ -O2 test.cpp -o Go2
me@cl1n024:~/bin/pbstest/Interactive> taskset -c 3 time ./Go2
49.61user
1.02system
0:50.63elapsed
100%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+390890minor)pagefaults 0swaps
icpc test.cpp -o Go2
test.cpp(26): (col. 5) remark: LOOP WAS VECTORIZED.
me@cl1n024:~/bin/pbstest/Interactive> taskset -c 3 time ./Go2
0.25user
0.40system
0:00.65elapsed
99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+195584minor)pagefaults 0swaps
Sorry for any spamming.

--
.
Re:It all depends by mkcmkc · 2009-03-30 12:35 · Score: 1

GCC's libstdc++6-4.3 std::vector size() is computed by subtracting the end-begin ptrs. There is no size field!
Okay, but this amount to the same thing, right? (That is, you still need an extra pointer that's effectively encoding the size of the vector, and that field must be updated on every size change.)

Vector should be no slower than array, if it is, you are probably using it wrong. The only exception is if you constantly create arrays on the stack, those would be faster on most platforms than vector because constantly allocating vectors cause heap allocations.
I think this was at least part of what I was doing. If you allocate a vector of small, dynamically determined size in a function, and run that function millions of times, the difference in relative cost between doing it with a alloc'ed C array and a malloc'ed vector could be quite noticeable. The former is essentially free, and I was thinking/hoping that the compiler-plus-STL would optimize this away for the vector case, too, but apparently it's not that smart.
I probably had other things going on as well.

--
"Not an actor, but he plays one on TV."

Re:every project starts that way by mkcmkc · 2009-03-27 09:43 · Score: 3, Insightful

You're not CPU bound until you: add all the features, handle the special cases, add the error checking, scale up beyond trivial test data, etc.

Then what? Rewrite?

Yes. If you didn't know all of that was going to happen, you're prototyping. If you're prototyping, you should be doing it in a prototyping language.

Rewriting from Python to C++ is not particularly difficult. Completely overhauling the design of a project written entirely in C++ is really unpleasant and takes a long time. So much so that many early design decisions on large C++ projects simply cannot be undone.

Model in clay first, then in stone later if you have to.

--
"Not an actor, but he plays one on TV."

It's part of Iraq now by tepples · 2009-03-27 09:44 · Score: 1

Nineveh then, Baghdad now.

JITting python code can be a memory hog as well by boorack · 2009-03-27 10:10 · Score: 1

Python code has no explicit type declarations. That means that generated (byte)code has to be type agnostic or the VM has to be able to generate concrete specializations on the fly. Type agnostic code kills performance (be it an interpreter or JIT generated code). Generating specializations consumes memory. This is why cpython interpreter is slow and psyco is a memory hog. Jython/IronPython variations propably have both drawbacks to some extent - faster than cpython but nowhere near fully optimized native code and quite memory hungry (as Java and .NET apps tend to be).

Porting python to LLVM will be a quite ambitious step with lots of work. I suppose they'll end up with a virtual machine having similiar performance characteristics to Jython/IronPython without overhead of Java/.NET/Java_programming_style. It will be suitable for server environments and this is what Google is paying for ;)

language independence by Gary+W.+Longsine · 2009-03-27 10:33 · Score: 1

The fascinating thing about the LLVM architecture is that you can bolt any language on the front end, and still benefit from a mountain of hardware-specific optimizations on the back end, without the need to figure them out and implement them yourself. Erlang, D, and Occam front ends for LLVM are just some code away... just a shout away, just a kiss away... kiss away... kiss away, hey, hey-ya...

--
If you mod me down, I shall become more powerful than you could possibly imagine.

Re:language independence by chthonicdaemon · 2009-03-27 19:23 · Score: 1

How many people on Slashdot listen to old Stones? Nice one.

--
Languages aren't inherently fast -- implementations are efficient

Re:Incorrect this is not by Martin+Soto · 2009-03-27 10:33 · Score: 1

It is not a project by Google's engineers, it's an independent project hosted by Google.

The project is indeed sponsored by Google. See the last question in their FAQ.

Also, 5x speedup is insignificant. Psyco already provides speedups much larger than that, depending on the type of code (algorithmic code could be improved 60x or more).

You're saying it yourself: depending on the type of code. Psycho may achieve impressive speedups for certain algorithms, but the gains are not has high in general. These guys are aiming at speeding all Python code up by a factor of about five, which would be far from insignificant if they suceeded.

By the way, Pypy is much more ambitious than this one.

Pypy is an interesting project. Unfortunately, though, they are progressing very slowly.

And finally, their goals and timeframe seem a little bit unrealistic. I'd love to be proved wrong though...

You may be right here. Only time will tell.

Re:every project starts that way by bnenning · 2009-03-27 10:41 · Score: 1

Since it is not going to be rewritten because of time, budget, it's good enough, [insert-your-own-excuse-here], let's opt to write it correctly and in an appropriate language from the onset.

If they're going to do a half-baked job in Python, then it would be tenth-baked in C. And if Python's performance is universally unacceptable today, I'm curious as to how you think we accomplished anything at all 10 years ago.

--
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.

Re:My Crow is just fine thank you by Rakshasa+Taisab · 2009-03-27 10:54 · Score: 1

Not even the crow beats station-spinning the Orca.

--
- These characters were randomly selected.

Re:i submitted this story to slashdot before you by maxume · 2009-03-27 10:56 · Score: 2, Funny

Here's your cookie:

/^\ \_/

--
Nerd rage is the funniest rage.

Re:i submitted this story to slashdot before you by karlzt · 2009-03-27 11:11 · Score: 1

:confused:

Re:Incorrect this is not by maxume · 2009-03-27 11:21 · Score: 1

Psyco only runs on x86, this project will (ostensibly) run anywhere LLVM runs.

--
Nerd rage is the funniest rage.

Effort in wrong place by Animats · 2009-03-27 11:24 · Score: 4, Informative

This is disappointing. Shed Skin has shown speed improvements of 2 to 220x over CPython. Going for 5x over CPython is lame. But Shed Skin is a tiny effort, and needs help.

PyPy got a lot of press, but they tried to do an optimizing compiler with "agile programming" and "sprints", and, at six years on with substantial funding, it's still not done.

The fundamental problem with running Python fast is its gratuitous dynamism. In CPython, almost everything is late-bound, and most of the time goes into name lookups. This makes it easy to treat everything as dynamic. You can store into the local variables of a function from outside the function, for example. In order to make Python go fast, the compiler has to be able to detect the 99.99% of the time when that isn't happening and generate pre-bound code accordingly.

Dynamic typing requires similar handling. Most variables never change type. Recognizing int and float variables that will never contain anything else creates a significant speedup. In CPython, all numbers are "boxed", stored in an object structure. This is general but slow.

CPython is nice and simple, but slow. Serious speedup requires global analysis of the program to detect the hard cases and generate fast code for the easy ones. Shed Skin actually does this, but has to place some limitations on the language to do it. If someone did everything right, Python could probably achieve the speed of C++.

There's also the problem that if you want to be compatible with existing C modules for CPython, you're stuck with CPython's overly general internal representation.

Re:Effort in wrong place by maxume · 2009-03-27 11:49 · Score: 1

How much of what you want do you get with Cython/Pyrex?

--
Nerd rage is the funniest rage.
Re:Effort in wrong place by Pinky's+Brain · 2009-03-27 11:59 · Score: 2, Informative

# Maintain source-level compatibility with CPython applications.
# Maintain source-level compatibility with CPython extension modules.
vs.
Shed Skin will only ever support a subset of all Python features.
Re:Effort in wrong place by k8to · 2009-03-28 06:46 · Score: 1

Shedskin is a degenerate specializing compiler.
It assumes that all variables never change type and thus only one specialization need be created.
This is useless for real world python, where it's quite common to acquire references to things and handle them regardless of their type.
Making a real specializing compiler, like psyco, go as fast as a traditionally compiler is really hard, so this is faster than psyco, but psyco works.

--
-josh

Re:What about Parrot? by larry+bagina · 2009-03-27 11:47 · Score: 2, Informative

LLVM is stable and in use. The iPhone SDK arm compilers use gcc with a llvm backend. OS X uses LLVM in the OpenGL stack to support features that the GPU doesn't. They're also using LLVM for openCL/Grand Central.

LLVM isn't just another virtual machine, it also optimizes that code (at compile time, link time, and/or runtime) and converts it to native (alpha, arm, cell, ia64, mips, CIL, pic16, ppc, sparc, x86) binaries (or C source code).

--
Do you even lift?

These aren't the 'roids you're looking for.

Hell, what about HLVM by Pinky's+Brain · 2009-03-27 11:47 · Score: 1

LLVM has it's own active sub project for higher level languages ... they just released an OCaml compiler in fact (easier than Python of course, since it is statically typed).

pointer to vectorization please? by mkcmkc · 2009-03-27 12:14 · Score: 1

Kramulous: Where's a good place to learn about this stuff?

--
"Not an actor, but he plays one on TV."

Re:pointer to vectorization please? by kramulous · 2009-03-27 13:31 · Score: 1

That's what sucks .. I haven't seen a definitive place yet and so far has been stumbling on the ieee website, intel site (whitepapers) and just generally playing with the intel compiler.
Give me a little more time (a day or two) while I chase up the urls of some of the pdfs and codes I have :)
I've been meaning to put this stuff together for a while now with code fragments to back it up.

--
.

Re:They should make it 24 times faster by poopdeville · 2009-03-27 12:22 · Score: 1

Goddammit! The next one of you fullatos that adds a number to another number is gonna hear it from my .45.

--
After all, I am strangely colored.

O's Billion Dollar Backside by Latent+Heat · 2009-03-27 12:57 · Score: 1

Oh boy, this has all kinds of possibilities. Consider the Plastic Surgeon of Venice.

Shylock (not a lender of money, but a plastic surgeon): I will have my 5 pounds of butt flesh!

Stedmann: (As lawyer representing Oprah). Be my guest. It won't even be missed!

unreasonable by r00t · 2009-03-27 16:11 · Score: 1

Now your developers must be good at both Python and C++. Note that I don't mean merely "able", because any decent hacker can pick up a new language in two weeks or less. Shallow ability won't really do the job. Getting good at a language takes years of experience.

At every point in time, the rewrite will seem like a much more task than fixing up the Python. You think things like: "just a little bit of optimization and this is going to be acceptable". It's not easy to commit to the rewrite, even if you know you need it.

BTW, completely overhauling the design of a project written entirely in C isn't so bad. You don't get the ravioli classes problem that most C++ code suffers from.

Re:unreasonable by mkcmkc · 2009-03-27 17:22 · Score: 1

Now your developers must be good at both Python and C++...
I don't think I've ever met anyone who was good at C++. Maybe having as little of it as possible to worry about is the best that can be hoped for.

At every point in time, the rewrite will seem like a much more task than fixing up the Python. You think things like: "just a little bit of optimization and this is going to be acceptable". It's not easy to commit to the rewrite, even if you know you need it.
Yes. This is a good thing, per Knuth.

--
"Not an actor, but he plays one on TV."

Everything I learned, I learned from... by WWWWolf · 2009-03-27 20:21 · Score: 1

Project Aims For 5x Increase In Python Performance

Everyone should learn the Truths of Software Engineering from some source. For some people, it's The Daily WTF. For me, it was the Ultima series developer quotes. This time, the appropriate quote comes from Ultima VIII:

"It's hypothetical, or I'm going to poke you in the eye." - Rob to Tony when Tony describes possible 20 percent increase in game speed.

Indeed Apple Basic running on 1Mhz machine is fast by spaceturtle · 2009-03-28 03:34 · Score: 1

Indeed. Most of my Apple Basic software running on a 1 MHz 6502 was damn fast. In some important ways scripted code is often much faster. If you want to change a line of C++ code: "I'll have to change a line of code. Oh no, I'll have to wait several minutes to relink correctly, then I'll have restart the App and get it back to this point". In scripted code: "Changes line of code. Hits retry."

PyV8 offers a 10x performance increase - right now by lkcl · 2009-03-28 04:06 · Score: 2, Interesting

The experimental combination of the Python-to-Javascript compiler, http://pyjs.org/ and the Python Bindings to Google's V8 Engine, http://code.google.com/p/pyv8 brings a ten times performance increase over standard python, already.

not - "10% now and 5x in the future" - that's a 1000% increase NOW.

When V8 supports the ECMAScript "Harmony" standard, which will include support for basic integer types, then there will be "correct" support in the PyJS + PyV8 combination for numerical types, and the word "experimental" can be dropped.

http://pyjsorg/ also includes an experiment showing the bindings of the PyJS compiler with the Python-Spidermonkey project. The spidermonkey JS engine has the advantage of running on generic platforms instead of just ARM and 32-bit x86 platforms, but has the disadvantage of being slightly slower.

Javascript is a _really_ interesting language that makes it in many ways highly suitable as an intermediate compiler language for compiling dynamic languages as Ruby and Python.

Re:every project starts that way by Dan+Ost · 2009-03-28 04:14 · Score: 1

Then what? Rewrite?

Then you profile your code to see where the bottlenecks are and implement the bottlenecks in C, leaving 99% of your code unchanged, but getting 99% of the speed you'd get if you rewrote the whole thing in C.

Sounds like win-win to me.

If you really need that extra 1%, use some of the money you saved on development time and purchase a faster machine.

--

*sigh* back to work...

LLVM for JIT by chris-chittleborough · 2009-03-28 05:49 · Score: 1

Great minds think alike! The Parrot team is already working on using LLVM for JIT code generation.

LLVM vs Parrot by chris-chittleborough · 2009-03-28 06:10 · Score: 1

For once, I disagree with Ars Technica. In Python, integers automagically overflow into "long integers" (ie., BigNums). Therefore you can only compile integer operations into low-level opcodes (x86, LLVM, etc) if you somehow know beforehand that the no BigNums are involved and overflow is impossible. In general, you have to compile Python into calls on a python-specific run-time library instead of opcodes. (You can still produce code that runs much faster than CPython's stack-based bytecodes by using a register-based VM and by pushing type-based dispatch as early as possible.)

IMO, trying to generate language-neutral machine/LLVM code is a bad idea. The Parrot team seem to agree: Parrot byte-code will strongly reflect the source language; their aim is not language-neutrality but inter-language operability.

Re:"write once" is misnomer by Abcd1234 · 2009-03-30 03:26 · Score: 1

I haven't looked at LLVM, but I think you need to learn about static single assignment (SSA).

I'm fully aware what SSA is. They model SSA using a set of write-only registers, but that doesn't change the fact that the dataflow is modeled using a machine language that is RISC-like in architecture.

There doesn't have to be any conflict between the two: lots of SSA intermediate representations look like RISC.

I completely agree. But I never claimed there was such a conflict.

then you are also confused (and egregiously confusing your readers) when you describe its registers as being "write-only"

No, I'm specifically correct about them being write-only, and they're write-only specifically because they're using SSA to facilitate certain optimizations, as it makes dataflow analysis a *lot* easier.

Slashdot Mirror

Project Aims For 5x Increase In Python Performance

195 of 234 comments (clear)