Van Rossum: Python Not Too Slow
snydeq writes "Python creator Guido van Rossum discusses the prospects and criticisms of Python, noting that critics of Python performance should supplement with C/C++ rather than re-engineering Python apps into a faster language. 'At some point, you end up with one little piece of your system, as a whole, where you end up spending all your time. If you write that just as a sort of simple-minded Python loop, at some point you will see that that is the bottleneck in your system. It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++ rather than rewriting your entire system in a faster language, because for most of what you're doing, the speed of the language is irrelevant.'"
Title is kinda silly.. as the basic referenced statement is that in some cases python _is_ too slow but that one can work around that using hacks (or a language agnostic component oriented architecture).
As for:
You said that if you trust your compiler to find all the bugs in your program, you've not been doing software development for very long.
It’s not about finding all the bugs, or even many of them. It’s about another layer where a potential bug can be caught. Runtime bugs are the worst kind as they can sit dormant for a while if in a rarely traveled branch. The more checking that can be done at the compile level, the better (imo).
Personally my biggest complaint about python wasn’t on the list: A lot of the (common) libraries out there are poorly documented, inconsistent, buggy, or incomplete.
As a Gentoo user, the python 2/3 thing is also especially annoying. Obviously this isn’t really python’s fault.. but it still gives me a bad taste about python.
That said, this was a great article.. short, to the point, and the answers were pretty good!
Like what, calculator? Or Spreadsheet? Or gmail.com? Now i see why Google does not have any C/C++ developers
How much of Python is just straight-up "slow", and how much comes from the fact that Python devs prioritize readable code over efficient code? In C, you usually try to squeeze as much memory out of each line as possible, but Python is designed to be human-readable first and machine-executable second.
Now that that is settled we can get back to the real problem with python: Type errors.
Two languages are better than one!
...Right?
"It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++ rather than rewriting your entire system in a faster language"
Ahh -- yes, I see, so I should write my Apps in Python, except where they need to be rewritten in C/C++ because that will run faster than when written in Python, but Python is not slow when you rewrite portions -- so don't rewrite in a faster language because Pyton is fast enough.
Alrighty then.
I'm waiting for the article:
Van Rossum: Python Not Much Worse Than Ruby
"Python creator Guido van Rossum discusses the prospects and criticisms of Python, noting that critics of Python should supplement with Ruby rather than re-engineering Python apps into a better language."
I'm signed up for the CS101 course @ Udacity, and I was surprised they were using Python for the course. It does seem a bit weird using whitespace for blocks, especially when you're used to writing stuff like
if(a > 0) { return a + 1; } else { return a -1; }
for the simple stuff. I do really like things like being able to return multiple values from a procedure, etc., but Python seems more useful for rapid prototyping rather than anything else.
Taking guns away from the 99% gives the 1% 100% of the power.
"oh, our language engine sucks? uhh, you fix it!" yeah right.
insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
Python is slow, but that's not its biggest problem. It's biggest problem is stability. My biggest complaint with Python apps is that they tend to crash a lot. More so than any other apps on my systems. In fact, most of the apps I've seen crash in the past couple of years have been Python utilities. Every once in a while I'll consider learning Python, but then I ask myself if I really want to develop slow and crashtastic programs and the answer is no.
I have a lot of experience in code optimization, and I would dispute this generalization. "often" is a lot more realistic than "usually". The most common thing I see is where one particular segment of an operation is coded by someone that doesn't understand their O's and is doing something like multilevel lookup loops instead of a hash table. Fundamental mistakes in algorithm choice are the biggest "HERE is the biggest problem" issues I find.
Once you're past the stupid implementation mistakes, it goes just slightly in favor of "it's a little bit of everything" land. Something running significantly slower in one language than another often boils down to the coder not understanding how to make things scale in the chosen language. I can make C move slower than BASIC if I want to. Sometimes it's just knowing how the compiler is going to react to your structures. Little things like "roll up the loops when coding in VB" can produce an order or two of magnitude in speed improvement, and if you don't realize this you may think you're comparing identical implementations when you're not. "this language sucks!" often translates into "I don't know how to do it so it runs fast!"
My last project was reduced from 23 hrs per run to 21 minutes by a small but complex change in implementation. From there, getting it down to 4 minutes required a LOT of little changes all over the place, to nickel-and-dime it down. I'll trade you my "guy that knows how to recode it in C" for your "guy that knows how to code, and REALLY knows his compiler" any day.
I work for the Department of Redundancy Department.
Actually the only problem with python is the last question and honest Guido answer: python doesn't scale. That's where it is slow, you have these multicore machines and you can't fully use them. Having to go to C/C++ is a shame. Even cython which can speed up a lot python code is limited by the global lock of death. So yeah, people migrating to python-like languages without this drawback.
As someone simulating fluid-structure interaction with a number of constituent models and a lot of finite element (i.e. big matrix problems; using FEniCS - fenicsproject.org), using Python makes my overall quite-long algorithm much easier to flick through. Invaluable for debugging the theory as well as the implementation. FEniCS' Python interface ties into the standard C/C++ libraries using SWIG and, in simple cases, saves me working in C++. Very clear, well-written C++ is great for this application but I find it takes considerably longer to write than clear Python.
When I hit a more intricate problem, I realized I was going to have to solve a series of FE matrices by hand (with PETSc, written in C). It turned out to be pretty straightforward to pick up SWIG, write a short module in C and a Python interface. Done! Particularly useful as I believe getting FEniCS and petsc4py to play well is tricky.
So, I'd agree - having written a C++ version of my (simpler) problem and a Python/C version of the complicated one, the latter was definitely easier, and all the rate-limiting stuff is in C anyhow.
Doubt it would be true for every situation but +1 from an FE perspective.
To bend a cliche: Purists! Can't live with 'em, can't live without 'em!
Seriously, we live in an era when programmers are no longer bound to the use of a single language for an entire project, as was the pragmatic case once upon a distant time. Why not just use the language for each module which best suits the need? If performance outweighs simplicity of code management, then use the better performing language for that module. No language is perfectly suited for all goals, so own your chosen criteria and don't 'blame' a language creator for having different criteria.
This is the case with all modern (java, c#, ruby) languages, you find the "simple" algorithm X isn't performing, find out the the language you are using doesn't optimize for that structurized implementation. Find out there is some memory optimized algo in C (c++) and decide to use it and then it is a measure of how well your crap integrates with the real machine code crap.
Arguing about whose interpreted scripting language is slower is like arguing about whose rich delicious cheesecake is less fattening. When you eat the cheesecake, you accept the tradeoff of tastiness for five minutes off your total lifespan.
As far as high performance languages go, we have:
FORTRAN: King of the performance hill, but so annoying to use that nobody really does outside some scientific circles.
C++: This is what everybody uses to write high performance applications, but it's a mess of special cases and annoying syntax and megabyte-long error messages from deeply nested templates.
We need a modern language, with things like functions as first class objects and introspection, but with the performance and "to-the-metal" nature of C++ when you care about designing for optimal cache efficiency and so on.
The problem with Python isn't the speed -- he's right about optimizing with bits of C. The problem is the GIL. Without good multithreading support, I have to give up on Python for a large number of application domains.
Don't most users of these scripting languages (the good ones anyway) profile and write the speed-critical sections in C or C++ anyway? That's not Python specific. It's not even specific to scripting languages. It's the same thing that C programmers do when they use inline assembly. It's like this all the way down the line. You start with rapid development at a higher level, then profile and optimize what you need.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Says the guy whose whole life is tied up in the language, and whose project, at Google, to speed it up, crashed and burned.
Python is slow because von Rossum refuses to cut loose the boat-anchor of "anything can change anything at any time". The straightforward implementation of Python, CPython, boxes all numbers (everything is a CObject, including an int or a float) and looks up functions, attributes, and such in a dictionary for every reference. And only one thread is allowed to run at a time. This allows one thread to dynamically patch the objects and code of another thread. Which is cool, but useless. 99.99+% of the time, there's no need for a dynamic lookup. Most program dynamism is shortly after program startup - once things are running, they don't change much. If, sometime shortly after startup, the program said "OK, done with self-modification", at which point a JIT compiler did its thing, the language would be much faster. But no. That's "un-Pythonic".
PyPy, the newer Python implementation, uses two interpreters and a JIT compiler to try to handle the dynamism with less overhead. They're making progress, but they need a very complex implementation to do it, and they're many years behind schedule.
Python, as a language, is very usable. But it's too slow for volume production. That's not inherent in the basic language. Python could remain declaration-free if there were just a few more restrictions on unexpected dynamism. By this is meant ways the program can change itself that aren't obvious from looking at the source code. For example, if a module or class could only be modified from outside itself if it contained explicit self-modification code (like a relevant "setattr" call) most modules and classes could be nailed down as fixed, "slotted" objects at compile time. The other big win is using enough type inference to decide if a variable can always be represented as a machine type (int, float, char, bool, etc.). That's a huge performance win.
Claiming that the "slow parts" should be rewritten in C is a cop-out. It makes the program more fragile, since C code can break Python's memory safety. Except for number-crunching, or glue code for existing libraries, it's seldom done.
(I have a Python program running right now which will run for over a week, parsing the street address of every business in the US into a standard format. The parser is complex enough that rewriting it in C would be a big job. There's no "inner loop".)
I was a little bit disappointed by Guido's response regarding static vs. dynamic typing:
InfoWorld: You talked about the arguments for and against dynamic typing. You said that if you trust your compiler to find all the bugs in your program, you've not been doing software development for very long. So you're satisfied with Python being dynamic?
Van Rossum: Absolutely. The basic philosophy of the language is not going to change. I don't see Python suddenly growing a static subdivision or small features in that direction.
Proponents of static typing do not claim that compilers, combined with languages that use static typing, will find all the bugs in your program. This is nothing more than Infoworld erecting a straw man and Guido knocking it down.
However, static typing does make a huge number of potential errors stick out like a sore thumb (the compiler will refuse to compile the code, and will emit appropriate error messages).
Some people (rightfully) argue that dynamic typing makes for shorter, prettier, easier code.
Some of us believe the primary concern should be correctness, and that shorter, prettier, easier code are secondary concerns -- almost always. People should think about this every time their computer crashes, or an application crashes, or something is acting up and needs to be rebooted, or they get a virus through no fault of their own, or their data gets corrupted.
Will users be thinking, "Gosh, this sucks, but I'm sure glad the programmer used a dynamic language, because it made it easier on him (the programmer)."? No, they'll be thinking, "Damn buggy programs! I just lost X (hours,minutes,seconds) of work, and now I'm frustrated!" Programming languages are a means to an end, not an end in itself. Don't be a self centered developer: the fruits of your labor are for users, not so you can write the code equivalent of poetry.
Not to mention, statically typed languages allow for easy refactoring possibilities that make it possible to fix all sorts of serious issues, including architectural ones, with reasonable effort expended. Dynamic languages, while they have made some progress in the area of refactoring, are really in the dark ages here.
I know dynamically typed programming languages are the hotness right now, and I'm sure my opinion will be hammered relentlessly, but I do ask that if you disagree, don't mod me down, but instead, bring forth a reasonable argument for a different position. This should not be a popularity contest, where the loser is not heard, no matter what side the loser is on.
I have similar problems with Perl -- it is too slow -- and it is impossible to fix it by re-writing parts of the system in C.
Here's the catch: you manipulate large data structures (e.g. large data sets read from a database). There is a bunch of functions for processing the data. Even if you re-write some of them in C they will still have to manipulate the very same stuff -- for example: a number in Perl is a scalar; a scalar has a string representation, a numeric representation, a reference count, etc. If you manipulate it in C you still have to take care of all this crap. So what's the point in writing that in C? Perl interpreter is also written in C. (Perl C API is way harder to use than Python's but that's irrelevant.)
So, it's like saying: if your program (in whatever language is too slow), run it through the profiler, find the routine that uses 90% of the CPU time and optimize it. Anyone who has used a profiler will know that this not always that easy.
My friend from Belgium (who speaks Dutch as the first language) said once: "oh, so the guy who created python is from the Netherlands? Well, I don't wanna touch it then".
Donald Knuth made this point in 1971, in his Empircal Study of Fortran Programs - virtually none of most programs has any significant effect on performance.
Not that he was the first to say these things, but that book says them so *well*.
"a bad taste about python"
Coincidentally, "it tastes like python" ['det smakar pyton'] is a Swedish idiom for something that tastes really bad.
I use python a lot to process large string logs (hudreds of megs or a couple of gigs). The problem is it's all super quick until the working set in the garbage collector gets too big and you fall off a performance cliff. I dunno what they're doing in there, but you easily go from a minute or two run time to half an hour to an hour because of the paging.
I'm not familiar enough with other garbage collected languages and such workloads to know if this is inherent or just a problem with the Python GC. Either way, I think it's fair to say that Python is too slow under such circumstances. I'd like to see it fixed, though, rather than abandon it :)
Ah, that is a really good troll. :) I am actually quite impressed. :)
Those scripting languages guys are so stubborn when it comes to recognizing mistakes. That's not only that python guy, it's the perl and ruby community, too.
Instead of building pragmatically and fast vms and new superior copying generational garbage collectors, they say "use C or C++".
While i have to say that building a binding for your own code is not that hard, distributing them is terror. For example, it takes really much effort to get SDL for perl running on the system. And don't get me started on SDLx.
While SDL is a must-have, you don't want to have more depedencies in binaries as needed. So in some cases, you just which a bytecode interpreter
which is a bit faster. And I have those moments, daily.
And there is a lot room for improvement due to that Euphoria benchmark:
http://www.rapideuphoria.com/bench.txt
At least, they should get up to pike or lua. And the fact, that a one-man project (Gambas) can beat all those languages in performance doesn't make it better.
I hate to say, but PHP seems to learn from their mistakes. But Python, Perl and Ruby? No way. They can be happy, if they don't extinct like Tcl did in ten year.
For some reason the title makes me think of Monty Python's Holy Grail, the "Not dead yet" line.
I'm not trying to troll, but I hire a compiler to optimize my program. And a profiler to optimize further. What the python fans are saying is python is too slow, so use C++, and do it by hand. I'd much rather have a JIT take care of that for me, and if needed, in rare circumstances, I should intervene to make it even better.
Plus Python manages most of the memory management for you ....
Python! It manages the management for you!
Now if only we could get something like that for meatspace...
I always find the statement that most of the time is spent in just a small part of the code tricky. I am not sure it is even true for many programs. I wrote some software for http://aichallenge.org/ and found that very many somewhat intensive operations accumulate, which made the python version of my software prohibitively slow compared to a C++ version I wrote, with no easy single performance bottleneck to blame.
In other news, Pope claims he isn't very Catholic and bears claim they don't sh*t in the woods that often...
Integrated multi-language solutions are teh suck.
I know that Python is much better than a lot of other languages for integrating C/C++ code. But in the end, if you're doing production systems, you'll end up getting bitten by some unforeseen incompatibility caused by some upgrade somewhere.
It will happen.
In the course of every project, it will become necessary to shoot the scientists and begin production.
This is kind of Guido not getting it. He gets a lot of stuff, very smart guy in fact, but not entirely this.
CPython is not blazing, but it's almost always fast enough.
Yes, it's excellent to write 95% of your program in Python and then optimize the inner-most loop with C or C++ if that proves beneficial.
Other important options include using Pypy for the whole program by changing your #! line, or Cython for just the innermost loop by translating your Cython code to C and building a Python "C Extension Module" from that, which can be imported much like normal Python code.
But more importantly, CPython != Python! It did in the past, but that's really not at all true anymore.
Pypy is really quite fast for uniprocessor tasks, and is a very compatible version of Python, albeit one that like CPython, still has a GIL.
Jython and IronPython are Pythons that don't have a GIL, and hence thread well. I've not done much with IronPython since it doesn't have a Python Standard Library, but I found Jython surprisingly fast for long-running programs. Not as fast as Pypy, but pretty good for a VHLL written in Java.
Nuitka appears to be coming along - it's a Python compiler. I've not yet tried it, but it sounds promising.
For a microbenchmark comparing many different Python runtimes, check out:
http://stromberg.dnsalias.org/~strombrg/backshift/documentation/performance/index.html
The "pyx" rows are Cython. The others are probably pretty apparent.
That's still not a good excuse. People take the "optimize only the hot spots" bit too far sometimes. For an application developer, it's mostly a good strategy to only optimize hotspots. For language implementors this still doesn't let them off the hook!
Think about it like this: your goal with performance optimization is to maximally leverage developer time "wasted" on optimizing. The small inner loop in your application that comprises 5% of the code but uses 95% of the CPU cycles is a clear win: optimizing this leverages your optimization efforts tremendously. Optimizing the 95% of the code that uses 5% of your CPU time does not.
However, one also has to consider how often something is executed in total, globally. In other words, the more shared your code is, the more important its optimization is. The fluff bits of your one-off application that will probably be re-written or go away in a few years are never worth optimizing. But that shared library you wrote that 334 other projects will use and might last a decade? That's really worth optimizing the hell out of. A language interpreter is the ultimate case of this. The python interpreter is going to be running untold millions of peoples' applications for many years, in some cases decades before being replaced with a newer version of python. The multiplicative factor here is enormous enough that it's worth *every* effort to optimize any part of the language as much as possible.
I replaced a Perl program with a Python rewrite the other day. This is a pretty simple program where most of the time is spent matching a single regex against a file of data. At first the Python version was literally twice as slow as the Perl. Which is ridiculous; both were spending most of their time performing the same amount of IO (written in C) or running a regex engine (written in C). Why was the Python so much slower?
The answer boiled down to painfully slow method lookups. Python was looking up the "match" method in my compiled regex object every single loop iteration. A simple
brought Python up to the same speed as Perl.
In short, until I applied a hideous optimization, my code was too slow even though most of it was already in C! Writing any more of it in C would have basically meant not using Python at all!
Performing a basic micro-optimization by hand literally doubled performance. Why doesn't the compiler do that? If they really aren't going to add any optimizations to the Python compiler then I guess the only hope is to introduce a JIT.
(And I mean "introduce a JIT to the standard Python distribution". Psyco and suchlike are all very well, but if it's not in the standard distribution, it might as well not exist. Most Python code has to make do with the basic installation and never gets any external modules made available. This is the whole reason why "batteries included" counts as an advantage versus Perl's enormous-but-external CPAN, guys!)
((Also: hang on, this doesn't even make sense. Why on earth is looking up __call__() faster than looking up match()? What the fuck, Python?))
That's what we used to call it when I was coding smalltalk in the 90s - we would identify sore-spots that couldn't be effectively coded in Smalltalk, rewrite those parts in pure C, and call them via DLL (we used Visual Smalltalk which easily worked with external libraries).
This doesn't invalidate the assertion that Smalltalk+(a bit of C) is more efficient to develop in, but it does stretch it a bit to say that Smalltalk "isn't slow". The tradeoff in being able to pull up a walkback on any data element in the system (unlike Java, Smalltalk allowed you to convert primitive data elements into full-blown objects only when "inspecting") far outweighed the performance aspect, and we could optimize as needed.
The downside to such an optimization scheme is that it does tie you to particular libraries, OS and machine architecture if those assumptions are not independent in the C code. It's also a pain to deal with two code languages if you aren't somewhat fluent. Code-generators in the primary (slow) language are somewhat helpful if you have lots of C to write.
Make sure everyone's vote counts: Verified Voting
To whoever complains Python is too slow compared to C. If you use a screwdriver as a hammer its your fault, not the screwdriver's.
Civilization 4. Firaxis wanted to make it nice n' moddable. So they scripted a bunch of it out in Python. Makes it real easy for users to edit in the field. However the AI they couldn't do that with. They wanted the users to be able to edit that too, but it was too slow in Python. So they did that in C++ and made it a DLL, and then distributed the sources. Harder to work on than a Python script, but fast enough that the game didn't bog down. The core of the game engine was C++ too and the Python was integrated with Boost.Python.
Works really, really well. Again, the right tool for the job. If they'd tried to do the whole game in Python, it would have been a disaster performance wise (and I'm not even sure if it would be possible, if Python can call on DirectX in the way C++ can). However a mixture worked brilliantly.
It's nice when the proponents of a piece of technology can be honest about its shortcomings and acknowledge them, rather than trying to sell their technology. It's the honest response vs. the fanboy response.
Problem: "It's too slow."
Honest response: "Yeah, sometimes it is slow. Here are the design compromises that make this slowness necessary. Furthermore nobody has volunteered the manpower to make it faster. If you need something faster, try x."
Fanboy response: "Yeah well, anytime I try to do something, it's fast enough."
Problem: "There's practically no static typechecking."
Honest response: "You're right, there isn't. There are ways you can deal with that, such as writing tests and thinking carefully about your code. The lack of static typechecking allows us to build a language that is better in this way and in that way. If you want static typechecking, try language x. Certainly there are advantages and disadvantages to the way we did things, but it best suited our objectives and if your objectives are aligned with ours, consider using our system."
Fanboy response: "Yeah well, if you're relying on a compiler to catch all your bugs, you're in trouble."
I know there are a lot of C++ haters here but having read some of what Bjarne Stroustrup has written, he typically offers honest responses, not fanboy responses. I was impressed with an article on the Arch Linux wiki once for the same reasons, as it gave an honest response to "it takes too long to install" (which was something like "yeah, it takes awhile, which has advantages and disadvantages. Try Ubuntu for quick installs.") rather than a fanboy response ("quick installs, who wants that crap anyway, it will just give you a junk system.")
To be fair to GVR this was a short informal article; it may not be fair to expect him to expound in this kind of a format.
Penny - plain text accounting
Right now, all the hipe in pythonland is centred around Pypy, the python-in-python jitted implementation of python, which is all the rage because of its performance (5.3x faster than cpython on average).
However, It always surprised me that this project never got the propper attention from Van Rossum itself.
He almost never talks about it, unless someone explicitly asks him. What's more, he always described himself as someone not worried about performance at all.
He said he is not a "speed freak" implying that pypy is far from top on his list of cool things to follow.
And now that Python is getting a lot of attention thanks to pypy's success, he comes up with a cooment like this.
I'd like to hear once for all from him what he really thinks about the project...
So maybe I'm just one of those idiots, but I've written some code that does heavy number-theoretic work in Python, and the performance is basically what you get out of an assembly-optimized piece of code. Why? Because I did exactly what Guido is suggesting here: I used an optimized C/assembly library for the parts where my code spends most of its time. Python has bindings for the GMP library, and my code spends over 99.9% of its time inside the GMP routines, which are a hell of a lot faster than anything I'm ever going to write.
The end result? I have code that does elliptic curve calculations with moduli in the 200-kilobit range, running at nearly the same speed as a C program-- but I only had to write about 1600 lines of code in Python, versus a WHOLE lot more if I had written the code directly in C-- allocating, deallocating, dealing with intermediate values, etc. And it's a hell of a lot more clear and maintainable: I can write x = pow(base + 2, (exponenent1 - offset) / 2, modulus) instead of about 4 lines of C code (not counting variable allocation/deallocation). And the performance is basically the same, because 99% of the work is being done in the "pow" operation.
So my code is indistinguishable from a performance standpoint. It's more easily maintained. How the hell is Guido crazy, here?
This is an oxymoron: "self-resizing arrays and sane strings at bare-metal speeds". That's not bare-metal speed. You get bare-metal speed by using static data structures and static data. C++ doesn't do arrays or strings any faster or slower than a scripting language. That's because the code you need for either of those things is almost identical in all implementations from the perspective of the "bare metal".
Now C++ can do loops and non-dynamic member function invocation faster than scripting languages. But that's because those things rely on less dynamic data than in the scripting engines. It's just a few jumps already cached and pipelined by the processor.
Most C++ apps are insanely slow compared to their C counterparts because the C++ folks rely on all this dynamic stuff. They may as well just write their crap in Javascript or Python. All the C++ code written for benchmarks is basically a gussied-up version of the C code, which just gives C++ programmers permission to lie to themselves about why they use C++.
"Hence what Guido is saying. You can do that "most stuff" part in python, and then write that "some stuff" part as a C module. You can then use that module from python. Thus you get the benefit of both languages."
No you don't. If you had written it in C++ you could natively use C++ objects and libraries fluently.
It takes more skill and system-programming knowledge to deal with the tricky interfaces between the internals of a Python interpreter and an external C++ program. The object structure in Python is obviously alien to the C++ object system. Of course there is always workarounds, but they are inconvenient and require arcane knowledge. Somebody who is an expert at Python internals doesn't think this is hard. Somebody who is an expert at computational algorithms but not compilers does think this is hard, and especially thinks this is something that they really do not have a desire to learn. Me, I have far more interest in spending my time reading a machine learning research paper rather than learning about some crufty programming interface.
The article is a naive cop-out for not specifying a sufficiently good language in combination with implementation techniques (they go together). Guido could try to implement a whole bunch of new libraries in some external Lisp and see how much he likes it versus writing native Python code. It sucks.
I'm writing a very niche CFD code for aircraft design that will advance the state of the art in my particular, tiny, tiny field. It's almost all in Python because I can code up a GUI in a couple of days and use numpy and scipy for some heavy lifting that I would never have the mathematical skills to code myself. The final bit of esoteric number crunching that I do need to code I can write in a couple of hundred lines of Fortran (nothing more than F77 probably) and compile to pyd using the almost trivially easy-to-use f2py.
I realise my app is not mainstream, but I challenge anyone to find me a better route to getting a usable product to market.
Exceptions are great in a garbage-collected language such as Java or Smalltalk or Lisp, because most programs allocate and manage a lot more little blocks of memory than they do other types of resource, and in a garbage-collected runtime environment it doesn't matter if you're half-way through initializing some complex data structure when something odd happens inside the constructor of one of your object's members and it throws an exception. The half-initialized data structure is just dropped on the floor and is cleaned up by the garbage collector. 95% of a program's resources are memory, and don't need special management to avoid leaking them if exceptions are thrown. In Java or Smalltalk you only need try/finally code to clean up actual resources such as synchronization locks, file handles, sockets, etc.
But in C++, you don't have the safety net of the garbage collector. So you have to ensure exception safety for ALL of the code you write which calls anything which might possibly throw an exception. This turns out to be really hard, even with RAII techniques. In Java, the availability of exceptions as an error-handling mechanism has small costs and large benefits. For C++ projects with lots of manual memory management, it almost seems like the opposite is true.
I work in the console game industry, and we don't use exceptions, ever, in game engines. We just completely turn them off, and use C-style return codes for error handling. We do a lot of manual memory allocation (and write our own memory allocators, and our own replacements for STL containers, and systems for defragmenting physical memory, and lots of similar trickery to help us absolutely make the most of the limited amount of RAM that consoles have. In a console game, a memory allocation failure means a hard crash--throwing an exception from operator new would be useless. And the small savings in code size from turning off exception support is nice too. But the biggest benefit, is we don't have to worry about writing exception-safe code. Because let me tell you, I've met some wizardly programmers in the game industry who can hand-optimize large chunks of vector math into unrolled assembly code for these consoles, but I've never met one who can consistently write bug-free, exception-safe C++ code. Some of our target platforms don't even support exceptions properly in their compiler and/or standard libraries.
It's an interesting coincidence that I had a conversation just today with my company's software engineering manager. His observation was that most people including software engineers are wrong most of the time about what's making their code run inefficiently. It's easy to imagine that you have a scrap of slow code deeply embedded in your process that is bogging everything down, or that there's nothing wrong with the method, you're just running Python and Python is not as fast as C -- or assembly.
He said that most of the time, what he found based on the evidence was that slow code was slow because it was doing something the programmer hadn't thought carefully about, like causing unnecessary swaps or having a wait where there shouldn't be one. Rarely did it turn out to be the language at fault.
But I can see how it might be frequently characterized by the language, given that we all *know* Python is slower than C. But if I went into that inner loop and recoded it in C and made it so much faster, I won't bother to mention the extra steps I found inside the loop that should have been outside, one of which was a system call.. No, it's the LANGUAGE, not the fact that I could have made it run four times as fast in Python and that would have been fast enough.
There's always the question of how fast is fast enough. If it takes me 40 hours to crunch my code, fix it and retest it and that cost the company $4000, I better be chasing a more-than-$4000 problem. And it better not be keeping me from chasing down a $20000 problem.
Case in point: two months ago I was asked to reingeneer a batch process written in Python because it was too slow when processing large[r] amounts of data. It was designed as a nightly script to process several thousands of records, but when the number of records grew to 500k it started taking hours instead of seconds.
Surprizingly, profiling showed that the whole slowdown was concentrated in one point where a bunch of SQL statements were appended one to another in a string, which in Python happened to be immutable. Changing a string to a list in one place (approximately 10 minutes worth of work, including diagnostics and changes in the unit test) improved the runtime for 1mln records from (projected) 72hours to 3 minutes.
And I saved myself the trouble of rewriting a Python app in C++ - not the easiest endeavor.
There are some situations where optimizing small sections by linking in C/C++ code doesn't work. Callback function for integration/numercial methods libraries spring to mind. Writing the library so that non-professional programmers who are more interested in the science can use it, generally means having them write their callbacks in python. Where does the program spend most of its time? In the callback. Python is too slow.
Also I've found that python has other limitations. I coded a simple recursive depth first search, and I'm happy to let it run for days/weeks on my data and n=128, but python's memory usage and artificial limits cause the python equivalent of stack overflows. I had to rewrite in C++... Not a speed issue, but python isn't great for everything
It does depend a lot on context. The examples you gave are very simple, and avoid any of the awkward areas.
Consider a different example: calling a C library function that takes an enumeration type for one of its parameters.
Typically, in C, you will have a file something.h that contains the interface for a library, including things like function prototypes and enumeration definitions.
Unfortunately, this is all symbolic metadata that Python ctypes can't just read out of a shared object file. There is no handy way to have Python import all the enumerators from the header file and define them as symbolic constants, for example, or for Python to scan the header for function prototypes. Thus in practice we end up with things like ctypes argtypes and restype hassles, and with some sort of duplication of all the enumerators in Python code that then has to be maintained perfectly in sync with the underlying C interface.
What's more, Python doesn't have a concept of enumerations as such; PEP 354 was rejected. That means the best you're going to do is define a whole bunch of named constants for the enumerators in Python, and have your argtypes indicate the corresponding underlying integer type. Oh, but then the underlying type of an enumeration varies depending on the enumerators themselves, so now the effective prototype of every function that uses your enum as a parameter or return value, and thus the corresponding Python ctypes argtypes and restype data, has to change just because someone added the wrong value to an enumeration.
In short, Python/ctypes and C/enumerations are not a very happy mix. There are other techniques, as you mentioned, and tools for helping with automatic parsing of header files, and each approach has its own pros and cons. But there are plenty of cases where the overhead of calling from Python down to C code is unpleasant at best. As you can see from the Cython example you linked to, it's delusional to pretend that you can really just compile almost-unchanged Python code down to C, too. It doesn't take much of this hassle before you've lost most of the productivity benefit from writing in a higher-level language like Python, and you start thinking about writing a much wider chunk of your project in C just so you can move the Python-C bridge to a place in your architecture where the interface can be much simpler.
This isn't to say that I disagree with writing high-level code in Python and calling down to C in specific cases. Sometimes it really does work almost transparently. Most of the time it's not too bad, and if each language is much better for the kind of code you want to write in it (as IME it often will be) then the overheads in both developer time and run-time performance are a modest price to pay. But it's not always as easy as the introductory examples might suggest.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
It turns out that developer productivity is actually more important than almost anything.
Developer productivity might be more noticeable than almost anything. However, I shudder to think how much safer and more productive the world would be, and what impressive things we might be doing with the computing power available to us today, if we hadn't trained the general population to believe that slow, bug-ridden, insecure, barely usable software is somehow acceptable or even unavoidable.
These things each contribute a staggering drain on the global economy and the quality of human life. They just aren't as visible as a product that ships late or is missing a feature, because it's accepted as the norm that we wait half a minute each time our software loads a document, or we go and make a coffee while today's security patches are installed and the system reboots, or no-one really understands how to use $FEATURE in our business software anyway so we'll just file a support ticket and get back to it later, or the network is down this morning while MIS contain a virus outbreak (caused because Bob the CEO was too important to to read the IT policy and didn't know how to secure his laptop wireless connection anyway, which let someone install a backdoor, which in turn was exploited when dear old Susan in Accounts brought in the baby photos of her grandkids plus a bonus virus on a USB stick).
It isn't even clear that doing better would cost much more in either developer time or money. It would, however, require a change in mindset, a demand for higher standards, and much better training, collaboration and support for developers, across the whole industry or at least enough of it to act as a catalyst. Unfortunately, a lot of people don't like change, particularly change that requires questioning ideas they've taken for granted for a long time and acknowledging that they might have been doing a bad job as a result of those ideas. Also, the people most qualified to inspire and set a good example are probably too busy doing real work in niches where their skills and experience are genuinely valued, leaving the mainstream to soak up consultant-driven "best practices" that have an engineering foundation roughly comparable to an inch of sand under a skyscraper.
So, I think it turns out that developer productivity is more visible than almost anything. The jury is very much still out on whether it's more important.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Perhaps that's because Python loops suck? I mean, literally, try writing a version of this in various languages and time them:
value=0
for i in xrange(1000000000):
value=i
At least the last time I tried it, just about every language but Python had a sensible execution time (read, without 20 seconds). Python? It takes minutes to execute. :/ And I don't think it's the result of some optimization either, as you can look at gcc's assembly to see it will at sufficiently low optimization actually produce the one billion loop and it still executes fast.
Eurohacker European paranoia, gun rights, and h
Comment removed based on user account deletion
Well of course Python is too slow. Guido clearly has no understanding of performance if he thinks you can simply replace tiny hotspots with C code, and no idea about interfacing either, if he thinks that's an effective way for an application programmer to work. If you want most of the convenience of a scripting language and performance like C you should use join the Felix project. It's a scripting language, but it compiles down to machine code, not bytecode. It's statically typed, though most of the time you'd never notice.
Amen to this.
Python is great until you have to face the bundling/packaging issues.
Distribution of compiled code (where you can often get away with a single executable or library file) is an order of magnitude easier that distributing a python app where you essentially need to ship the python distribution along with your python code (and any wrapped libraries you add).
I use pyinstaller for this purpose, but maintaining the packaging over years has been more complex than maintaining the code itself.
Where does this idiot idea that Python is 'slow' come from, apart from whingeing Java ingenues?
Python is not slow - its fast, literally about ten (10) times faster than Java and with a much lower memory bootprint.
I don't think a lot of people understand that its a 'callout model' not a full 'virtual machine' like Java, and that has both advantages and disadvantages - mainly advantages.
It is the most perfect prototyping language ever developed, and yes, if during profiling you find a chunk that needs to be superoptimised then DO it. Its how the language is designed - break that module out and do it in C/C++.
And PLEASE, stop criticising Psyco. If you are criticising it you are not understanding it - NO, you can't pre-optimise code, the profile will always show you stuff you didn't expect. Psyco is a very successfull attempt at writing an auto-inline-optimizer, and I don't think any other language environment has such a thing.
This is a false dichotomy. You don't have to choose between doing the project in C, or doing it in python + C. You could just use a high level language that offers good performance instead. Write it in ocaml or haskell and your code is just as short (or shorter) than python, but with an order of magnitude (or more) better performance.