Van Rossum: Python Not Too Slow
snydeq writes "Python creator Guido van Rossum discusses the prospects and criticisms of Python, noting that critics of Python performance should supplement with C/C++ rather than re-engineering Python apps into a faster language. 'At some point, you end up with one little piece of your system, as a whole, where you end up spending all your time. If you write that just as a sort of simple-minded Python loop, at some point you will see that that is the bottleneck in your system. It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++ rather than rewriting your entire system in a faster language, because for most of what you're doing, the speed of the language is irrelevant.'"
Title is kinda silly.. as the basic referenced statement is that in some cases python _is_ too slow but that one can work around that using hacks (or a language agnostic component oriented architecture).
As for:
You said that if you trust your compiler to find all the bugs in your program, you've not been doing software development for very long.
It’s not about finding all the bugs, or even many of them. It’s about another layer where a potential bug can be caught. Runtime bugs are the worst kind as they can sit dormant for a while if in a rarely traveled branch. The more checking that can be done at the compile level, the better (imo).
Personally my biggest complaint about python wasn’t on the list: A lot of the (common) libraries out there are poorly documented, inconsistent, buggy, or incomplete.
As a Gentoo user, the python 2/3 thing is also especially annoying. Obviously this isn’t really python’s fault.. but it still gives me a bad taste about python.
That said, this was a great article.. short, to the point, and the answers were pretty good!
Now that that is settled we can get back to the real problem with python: Type errors.
"It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++ rather than rewriting your entire system in a faster language"
Ahh -- yes, I see, so I should write my Apps in Python, except where they need to be rewritten in C/C++ because that will run faster than when written in Python, but Python is not slow when you rewrite portions -- so don't rewrite in a faster language because Pyton is fast enough.
Alrighty then.
I'm waiting for the article:
Van Rossum: Python Not Much Worse Than Ruby
"Python creator Guido van Rossum discusses the prospects and criticisms of Python, noting that critics of Python should supplement with Ruby rather than re-engineering Python apps into a better language."
I'm signed up for the CS101 course @ Udacity, and I was surprised they were using Python for the course. It does seem a bit weird using whitespace for blocks, especially when you're used to writing stuff like
if(a > 0) { return a + 1; } else { return a -1; }
for the simple stuff. I do really like things like being able to return multiple values from a procedure, etc., but Python seems more useful for rapid prototyping rather than anything else.
Taking guns away from the 99% gives the 1% 100% of the power.
I have a lot of experience in code optimization, and I would dispute this generalization. "often" is a lot more realistic than "usually". The most common thing I see is where one particular segment of an operation is coded by someone that doesn't understand their O's and is doing something like multilevel lookup loops instead of a hash table. Fundamental mistakes in algorithm choice are the biggest "HERE is the biggest problem" issues I find.
Once you're past the stupid implementation mistakes, it goes just slightly in favor of "it's a little bit of everything" land. Something running significantly slower in one language than another often boils down to the coder not understanding how to make things scale in the chosen language. I can make C move slower than BASIC if I want to. Sometimes it's just knowing how the compiler is going to react to your structures. Little things like "roll up the loops when coding in VB" can produce an order or two of magnitude in speed improvement, and if you don't realize this you may think you're comparing identical implementations when you're not. "this language sucks!" often translates into "I don't know how to do it so it runs fast!"
My last project was reduced from 23 hrs per run to 21 minutes by a small but complex change in implementation. From there, getting it down to 4 minutes required a LOT of little changes all over the place, to nickel-and-dime it down. I'll trade you my "guy that knows how to recode it in C" for your "guy that knows how to code, and REALLY knows his compiler" any day.
I work for the Department of Redundancy Department.
As someone simulating fluid-structure interaction with a number of constituent models and a lot of finite element (i.e. big matrix problems; using FEniCS - fenicsproject.org), using Python makes my overall quite-long algorithm much easier to flick through. Invaluable for debugging the theory as well as the implementation. FEniCS' Python interface ties into the standard C/C++ libraries using SWIG and, in simple cases, saves me working in C++. Very clear, well-written C++ is great for this application but I find it takes considerably longer to write than clear Python.
When I hit a more intricate problem, I realized I was going to have to solve a series of FE matrices by hand (with PETSc, written in C). It turned out to be pretty straightforward to pick up SWIG, write a short module in C and a Python interface. Done! Particularly useful as I believe getting FEniCS and petsc4py to play well is tricky.
So, I'd agree - having written a C++ version of my (simpler) problem and a Python/C version of the complicated one, the latter was definitely easier, and all the rate-limiting stuff is in C anyhow.
Doubt it would be true for every situation but +1 from an FE perspective.
To bend a cliche: Purists! Can't live with 'em, can't live without 'em!
Seriously, we live in an era when programmers are no longer bound to the use of a single language for an entire project, as was the pragmatic case once upon a distant time. Why not just use the language for each module which best suits the need? If performance outweighs simplicity of code management, then use the better performing language for that module. No language is perfectly suited for all goals, so own your chosen criteria and don't 'blame' a language creator for having different criteria.
Arguing about whose interpreted scripting language is slower is like arguing about whose rich delicious cheesecake is less fattening. When you eat the cheesecake, you accept the tradeoff of tastiness for five minutes off your total lifespan.
The problem with Python isn't the speed -- he's right about optimizing with bits of C. The problem is the GIL. Without good multithreading support, I have to give up on Python for a large number of application domains.
Don't most users of these scripting languages (the good ones anyway) profile and write the speed-critical sections in C or C++ anyway? That's not Python specific. It's not even specific to scripting languages. It's the same thing that C programmers do when they use inline assembly. It's like this all the way down the line. You start with rapid development at a higher level, then profile and optimize what you need.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Says the guy whose whole life is tied up in the language, and whose project, at Google, to speed it up, crashed and burned.
Python is slow because von Rossum refuses to cut loose the boat-anchor of "anything can change anything at any time". The straightforward implementation of Python, CPython, boxes all numbers (everything is a CObject, including an int or a float) and looks up functions, attributes, and such in a dictionary for every reference. And only one thread is allowed to run at a time. This allows one thread to dynamically patch the objects and code of another thread. Which is cool, but useless. 99.99+% of the time, there's no need for a dynamic lookup. Most program dynamism is shortly after program startup - once things are running, they don't change much. If, sometime shortly after startup, the program said "OK, done with self-modification", at which point a JIT compiler did its thing, the language would be much faster. But no. That's "un-Pythonic".
PyPy, the newer Python implementation, uses two interpreters and a JIT compiler to try to handle the dynamism with less overhead. They're making progress, but they need a very complex implementation to do it, and they're many years behind schedule.
Python, as a language, is very usable. But it's too slow for volume production. That's not inherent in the basic language. Python could remain declaration-free if there were just a few more restrictions on unexpected dynamism. By this is meant ways the program can change itself that aren't obvious from looking at the source code. For example, if a module or class could only be modified from outside itself if it contained explicit self-modification code (like a relevant "setattr" call) most modules and classes could be nailed down as fixed, "slotted" objects at compile time. The other big win is using enough type inference to decide if a variable can always be represented as a machine type (int, float, char, bool, etc.). That's a huge performance win.
Claiming that the "slow parts" should be rewritten in C is a cop-out. It makes the program more fragile, since C code can break Python's memory safety. Except for number-crunching, or glue code for existing libraries, it's seldom done.
(I have a Python program running right now which will run for over a week, parsing the street address of every business in the US into a standard format. The parser is complex enough that rewriting it in C would be a big job. There's no "inner loop".)
I was a little bit disappointed by Guido's response regarding static vs. dynamic typing:
InfoWorld: You talked about the arguments for and against dynamic typing. You said that if you trust your compiler to find all the bugs in your program, you've not been doing software development for very long. So you're satisfied with Python being dynamic?
Van Rossum: Absolutely. The basic philosophy of the language is not going to change. I don't see Python suddenly growing a static subdivision or small features in that direction.
Proponents of static typing do not claim that compilers, combined with languages that use static typing, will find all the bugs in your program. This is nothing more than Infoworld erecting a straw man and Guido knocking it down.
However, static typing does make a huge number of potential errors stick out like a sore thumb (the compiler will refuse to compile the code, and will emit appropriate error messages).
Some people (rightfully) argue that dynamic typing makes for shorter, prettier, easier code.
Some of us believe the primary concern should be correctness, and that shorter, prettier, easier code are secondary concerns -- almost always. People should think about this every time their computer crashes, or an application crashes, or something is acting up and needs to be rebooted, or they get a virus through no fault of their own, or their data gets corrupted.
Will users be thinking, "Gosh, this sucks, but I'm sure glad the programmer used a dynamic language, because it made it easier on him (the programmer)."? No, they'll be thinking, "Damn buggy programs! I just lost X (hours,minutes,seconds) of work, and now I'm frustrated!" Programming languages are a means to an end, not an end in itself. Don't be a self centered developer: the fruits of your labor are for users, not so you can write the code equivalent of poetry.
Not to mention, statically typed languages allow for easy refactoring possibilities that make it possible to fix all sorts of serious issues, including architectural ones, with reasonable effort expended. Dynamic languages, while they have made some progress in the area of refactoring, are really in the dark ages here.
I know dynamically typed programming languages are the hotness right now, and I'm sure my opinion will be hammered relentlessly, but I do ask that if you disagree, don't mod me down, but instead, bring forth a reasonable argument for a different position. This should not be a popularity contest, where the loser is not heard, no matter what side the loser is on.
Donald Knuth made this point in 1971, in his Empircal Study of Fortran Programs - virtually none of most programs has any significant effect on performance.
Strictly speaking, the language itself shouldn't have any effect on how fast it executes, it's the implementation that really matters.
This is nonsense.
Language syntax has a huge impact on how hard or easy it is for a compiler (ahead-of-time, just-in-time, or hybrid) to produce fast native code.
If that effort is too large, in terms of development effort and/or compiler analysis effort, you will simply never see a compiler written for those kinds of languages that produces fast executables. This is the reality.
So, in pragmatic terms...yes, some languages are slow.
As far as high performance languages go, we have:
FORTRAN: King of the performance hill, but so annoying to use that nobody really does outside some scientific circles.
C++: This is what everybody uses to write high performance applications, but it's a mess of special cases and annoying syntax and megabyte-long error messages from deeply nested templates.
We need a modern language, with things like functions as first class objects and introspection, but with the performance and "to-the-metal" nature of C++ when you care about designing for optimal cache efficiency and so on.
This is entirely true. What C++ does is excellent. The standard libraries are great - self-resizing arrays and sane strings at bare-metal speeds (if used with just a bit of skill). All the common algorithms . But the C baggage is really a problem.
There's a lot of syntax and just "tricks" that are needlessly complex becuase of the history. The learning curve is not just steep, but pointlessly steep. The level of control you get does not require the level of complexity thrown at you.
And the worst is - people still write C-style code in C++! Because C-style coding is obious in the language, and RAII is not, you still see people thinking exceptions are bad, and programming like it's 1989. Because template syntax is the worst macro langage ever, you don't dare use templates outside of seldom-changing library code.
All of the downsides of C++ are fixable with a from-scratch language with the exact same feature set, but no legacy syntax.
Socialism: a lie told by totalitarians and believed by fools.
Integrated multi-language solutions are teh suck.
I know that Python is much better than a lot of other languages for integrating C/C++ code. But in the end, if you're doing production systems, you'll end up getting bitten by some unforeseen incompatibility caused by some upgrade somewhere.
It will happen.
In the course of every project, it will become necessary to shoot the scientists and begin production.
Civilization 4. Firaxis wanted to make it nice n' moddable. So they scripted a bunch of it out in Python. Makes it real easy for users to edit in the field. However the AI they couldn't do that with. They wanted the users to be able to edit that too, but it was too slow in Python. So they did that in C++ and made it a DLL, and then distributed the sources. Harder to work on than a Python script, but fast enough that the game didn't bog down. The core of the game engine was C++ too and the Python was integrated with Boost.Python.
Works really, really well. Again, the right tool for the job. If they'd tried to do the whole game in Python, it would have been a disaster performance wise (and I'm not even sure if it would be possible, if Python can call on DirectX in the way C++ can). However a mixture worked brilliantly.
It's nice when the proponents of a piece of technology can be honest about its shortcomings and acknowledge them, rather than trying to sell their technology. It's the honest response vs. the fanboy response.
Problem: "It's too slow."
Honest response: "Yeah, sometimes it is slow. Here are the design compromises that make this slowness necessary. Furthermore nobody has volunteered the manpower to make it faster. If you need something faster, try x."
Fanboy response: "Yeah well, anytime I try to do something, it's fast enough."
Problem: "There's practically no static typechecking."
Honest response: "You're right, there isn't. There are ways you can deal with that, such as writing tests and thinking carefully about your code. The lack of static typechecking allows us to build a language that is better in this way and in that way. If you want static typechecking, try language x. Certainly there are advantages and disadvantages to the way we did things, but it best suited our objectives and if your objectives are aligned with ours, consider using our system."
Fanboy response: "Yeah well, if you're relying on a compiler to catch all your bugs, you're in trouble."
I know there are a lot of C++ haters here but having read some of what Bjarne Stroustrup has written, he typically offers honest responses, not fanboy responses. I was impressed with an article on the Arch Linux wiki once for the same reasons, as it gave an honest response to "it takes too long to install" (which was something like "yeah, it takes awhile, which has advantages and disadvantages. Try Ubuntu for quick installs.") rather than a fanboy response ("quick installs, who wants that crap anyway, it will just give you a junk system.")
To be fair to GVR this was a short informal article; it may not be fair to expect him to expound in this kind of a format.
Penny - plain text accounting
you still see people thinking exceptions are bad
Well, there are two things at work here:
Now, in practice exceptions are better than the known alternatives, for the simple reason that large amounts of code are simple hacked together without formal development methods. Still, I have seen projects that simple do not use exceptions, because they had too many problems with the weirdness that exceptions cause (although I suspect that if C++ and Java had Lisp-like exceptions, where the stack is left intact for the exception handler, things would be different).
Palm trees and 8
I use C++ for the STL (despite the awful syntax) and for better encapsulation. Boost scoped_ptr is nice too. Virtual functions are a nice alternative to switch() based dispatchers, but they don't need to be (and shouldn't be) used all the time. C was and is a great language. C++ is a horrible mess with a lot of nice features buried within.
Exceptions are just plain awful. Either you have to handle them one level up from the throwing function (in which case you could have just returned an error code), or you have to pass them to higher and higher levels, each of which is less likely than the last to be able to handle it sensibly. Or you have to crash the process, which you might as well have done in the first place.
Sure but if you save yourself 6 months of coding over the course of a two year project, you've got lots of spare time to figure out those internals. And by the time your next huge project rolls around, you'll just be that much further ahead of the game.
Monoculture is rarely the best option in any large system, including computer systems. It might make things "better" in the short term (by some measure) but if you've used the wrong tool for the job, the cracks will show eventually no matter how theoretically "pure" you can claim your work to be.
It takes more skill and system-programming knowledge to deal with the tricky interfaces between the internals of a Python interpreter and an external C++ program.
Is this experience talking, or guesswork?
Admittedly, I haven't had a need for it myself, but it looks easy enough. And you have plenty of options, too!
1. Extending Python with C or C++
2. ctypes
3. Cython
Examples for 1 and 2
Example for 3
It's The Golden Rule: "He who has the gold makes the rules."