Python-to-C++ Compiler
Mark Dufour writes "Shed Skin is an experimental Python-to-C++ compiler. It accepts pure, but implicitly statically typed, Python programs, and generates optimized C++ code. This means that, in combination with a C++ compiler, it allows for translation of pure Python programs into highly efficient machine language. For a set of 16 non-trivial test programs, measurements show a typical speedup of 2-40 over Psyco, about 12 on average, and 2-220 over CPython, about 45 on average. Shed Skin also outputs annotated source code."
Until he addresses mixed types in n-tuples, this won't be useful for very many people.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
As a UNIX admin, I was saddled with one of these kinds of things years ago, a DEC-BASIC to C compiler for UNIX. The output code quality was incredibly bad: machine generated variable and function names, bizarro nested struct/union/struct data structures, 400-line functions peppered with calls to 1-line functions. Completely unreadable. Thank $DEITY that project died quickly.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Why not pure assembler ?
Well, personally, I like Python more. I'm not against C++, but I find Python works better and is easier to use for what I want to do.
http://www.caretoicedance.com
See, it's all well and good to compile python to speed it up. The problem is, people are now saying that they can write efficient code in python just because it magically translates to C++, and because this translator is faster than other python compilers.
This won't be meaningful until a converted python script is compared to efficient code written natively in C++ in the first place.
StoneCypher is Full of BS
Since Python is about the only language I know very well, I find this fascinating. But it also reminds me of the .NET development suite, where the way you write your code in any language doesn't matter, since it all become one machine code. So if you think you can do something more memory efficient in C# than in VB.NET - well, no.
So the bottom line is, the quality will depend on the quality of the converter, and that's not so cool. Adding layers between code and machine code is not the best way IMHO.
This is a good step to make Python run a bit faster, but I don't think it'll really make a huge difference.
The best way to get some speed and still keep the nice Python functions and layout is just to export the most heavily used functions to native code (C/C++).
I don't know if its possible to take the C++ output and optimize it seperatly, that way you will have a good start to make native code though.
In short: Better, fast and easy, but not the best (if you can write native code)
My blog: http://www.redcode.nl
Among python programmers, I'm curious - how many use psyco (another python performance enhancement tool) for their projects? I fiddled with it a while ago (it didn't work because of a C module that it didn't like), but never had a compelling reason to go back to it. Performance optimization has never been important enough for my applications to merit the effort.
It's not wasting time, I'm educating myself.
I will have to explore it more, but it will be intriguing to see how they handle things like pointers and structs that are not in python.
Uh, why would they have to? This goes from Python to C++, not vice versa. If there are no pointers or structs in the Python code, why would they have to handle them? Certainly, it's quite possible that some Python variable types will be converted to pointers or structs in the output code, but that's orthagonal to the issue of Python not having them natively.
If you were trying to go from C++ to Python, then you'd have to convert C++ pointers and structs to some sort of Python data type, and your comment would make sense. As it is, I'm not sure what you were trying to say.
"The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.
No, not really. A large number of people, including myself, just use python as a nicer C. Futzing with pointers and other such things can be ingnored while making a prototype and, after finishing the prototype, the bits that need to be faster can then be rewritten.
I recently wrote a largish simulation in python for a Biology course. The goal was to watch how a species spread over a planet given other competing species, natural disasters and the like. It took four in deep hack mode to write the whole thing, all of it implicitely typed due to the equations at the base of the simulation. Implicite static typing is used a lot in large applications. So much so that in fact, if I recall, python 2.5 is supposed to have optional type declaration.
What if the entire Universe were a chrooted environment with everything symlinked from the host?
...kind of reminds me of the Google Web Toolkit which is more or less a Java to Javascript/HTML compiler. It's not an optimization thing like ShedSkin, instead it lets folks use the Java skills they already have to write better web apps. I wonder what they use to parse the Java code? I don't see any mention of JavaCC on their site, or ANTLR either for that matter...
The Army reading list
Portability. Besides, why not leverage the decent optimization that already exists in the gnu compiler collection? I guess for me, a better question might be, why not a python frontend to gcc.
PyPy
Well, it's not quite as bad as it sounds. He's seemingly only really forbidding incompatible mixed types in the same variable, a usage that isn't exactly extremely common.
A more significant roadblock, IMO, is that he can't handle mixed types in 3+-tuples, which is very common.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Why would one ever need to do that? The goal is not to write C++ in Python, it's to compile Python to machine code via an intermediate Python -> C++ compilation.
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
That's pretty much what he's doing, ShedSkin is a Python to C++ compiler, then you need to compile the C++ code ShedSkin yields to machine code, you can do that with gcc.
The goal (for the author) at the moment is to get a fairly complete Python to C++ compiler (ShedSkin is already very good if you're mostly doing simple operations such as crunching numbers, but if your program is really complex or uses libraries then you're out of luck)
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
bzerodi's point, made with Zen-like simplicity, is that language choice should be made to minimize programmer time, not machine time. I am at least a factor of ten more productive with Python than with C or C++. I am also far more confident in the correctness of what I write per line of Python than with what I write per line of C/C++.
Yes, I have have wasted some time staring at the shell waiting and waiting for it to return from some complicated Python routine. I know that compiled C would faster, and hand-rolled assembler would be faster still. But I say to myself: hey, I wrote this code in a single afternoon, how many weeks of hair-pulling would it take to re-engineer this - and make it bug-free - in C? When I put it that way, I don't mind waiting the extra minutes for Python to do my dirty work.
As a previous poster mentioned, the ability to handle tuples of mixed-types is critical. I look forward to seeing great things from Shed Skin in the future.
Dictionaries are for loosers.
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
surely the best way to speed it up is to compile it straight to object code... c++ has to be compiled and just adds an intermediate step which will make things harder to debug...
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
I meant frontend in a specific gcc sense (as in, something that parses the language and outputs rtl which can be consumed by gcc backends), not as a generic intermediate step that will eventually be compiled via gcc (or whatever), as that's generally true of most things that output c or c++ code these days, yeah?
...why not make it into a GCC frontend so Python can be compiled directly?
If you are just making a prototype, why is squeezing out extra perf so important? Prototyping is the sort of situation where you should be fine with just using straight python.
I'd rather be lucky than good.
After four hours of tweaking, our expert C++ programmer was finally able to write something that beat our ten lines of Python code that took under five minutes to write. And it didn't beat it by much, whereas the first pass at a C++ version was an order of magnitude slower.
Which is why languages like python were written in the first place. They pretty much just make the underlying C calls anyways, but do so in a way that handles buffer overflows, pointers, etc., that pretty much make C/C++ so troublesome, hazardous, and hard to learn. I like java (alot really), but nothing beats a good scirpting language, like perl or python, to handle tasks like text manipulation. Python is especially good at using libraries, such as the imaging library, which are written in C anyways. How much faster can you get calling a C library from C than from python? I honestly don't know, but I can't imagine it's that much more. But when you add in speed of development, safety, and even portability, it's powerful.
Python's OOP is also a feature that makes it far more attractive than perl for me. Perl does OOP, but it's not as clean as python's, and I don't think it supports all the OOP features either. Doing GUI's is not the strength of any scripting language, but it depends on what you need to do. You can write a native frontend and embed python into a C or even a java application.
My problem? I was perfectly gruntled, until some numbnuts came by and dissed me.
Why? Read the linked page? Says it all. Violates most any Python code of any complexity out there. So if it doesn't convert Python code from the real world, what is it for? Making Python coders learn enough about C++ to remember the limitations and write/rewrite Python code to use it?
What the Python C/C++ interested people REALLY need is a book written by a group of Python AND C/C++ masters which teaches the two simultaneously showing complimentary methods of doing any given thing working from beginner to advanced and I DON'T mean "How to turn your n00b Python code into C/C++ hotness" sort of viewpoint. I mean both taught simultaneously in synch showing how they can interchange and compliment.
Software tricks for converting? Ultimately worse than not having them because it leads to horrible obfuscation because we don't know exactly what is going on when 13,412 lines of Python is turned into C++ because WE DIDN'T WRITE IT AND WE NEVER LEARNED C/C++. "Say Mike, that's great but you're the company code cowboy and you don't do C++ natively and I sure as hell don't read it being management so exactly what happens if this needs to be fixed? We've gone from importing open source code you couldn't read to writing our own open source code you can't read."
If my grammar and spelling are off, I am [distracted/tired/careless] (take your pick)
The programming language can encourage the programmer to write an application faster. People have pointed this out before. But also, a programming language can encourage better programming principles. C++ makes it difficult to use complex data structures, while scripting languages like Perl and Python make it a breeze. Python also has the advantage of making object oriented programming simple, whereas the convoluted structure of C++ struct-based objects discourages programmers from taking advantage of them.
"Scientists don't change their minds, they just die." -- Max Planck
So much so that in fact, if I recall, python 2.5 is supposed to have optional type declaration.
No, it doesn't. AFAIK all the Type-SIG and other groups looking at it decided against it and the issue is dead.
rage, rage against the dying of the light
I love Python, but I hate the dynamic typing. It can be handy at times, but 99% of the time you make a variable to hold one kind of thing. Having the static typing would both improve performance (because the interpreter knew what you were up to) but would also eliminate bugs (because it would complain when I tried to set a double to "And now press...").
I'd love to see Python get optional static typing.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
"Doing GUI's is not the strength of any scripting language..."
;)
This is why projects like pyGTK exist
"A truly wise man realizes he knows nothing."
Sorry, but without more details it would seem to me that
your "expert" C++ guy wasn't an expert. Can you describe the
problem a little better.. if what you say is true, I as
a long term C++ programmer would consider switching, but
I've looked at python, and I simply don't believe you.
I'll grant that C++ is a nightmare for beginners with more pitfalls
than an indiana jones movie, but once you know them, writing
poorly performing code is unlikely.
http://rareformnewmedia.com/
I do understand that Python does not have structs or pointers. But pointers are fundamental to C++. I was very unclear in my first post. I was wondering if they would be converting certain types to pointers and I was curious how that was going to happen.
My
I'd love to see Python get *required* static typing.
But then, I guess it wouldn't be Python....
Why would you even try to compete with file I/O? Disk access is slow enough that an interpreted language is going to have no problem keeping up with a compiled one. As for the 4 hours, yeah, there's some stuff that Python just does better, but I'm inclined to say they just didn't know what they were doing, mostly because they even attempted competing on the I/O front.
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Not to mention that you can speed up execution time by throwing more hardware at it. If you try that with programmer time you just end up with a bruised programmer, and nobody wants that!
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
C++ makes it difficult to use complex data structures, while scripting languages like Perl and Python make it a breeze.
Complex data structures in Perl? Such pain I wish to never endure.
There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
They could be experts in C++, and not be already familiar with the highly unusual case of manipulating insanely large text files efficiently. Like me.
Attention zealots and haters: 00100 00100
Delta-Mike November Bravo Tango
As another poster already said, file I/O is a bottleneck regardless of ANY language. So, try something different. Real-time h264 decoding for example.
This sig does not contain any SCO code.
C++ makes it difficult to use complex data structures...
It does? I've always managed, somehow. If you're referring to things like structures of pointers to structures, smart pointer techniques have been around for ages that manage cleanup. Maybe not as trivial as a scripting language, but not difficult.
Python also has the advantage of making object oriented programming simple, whereas the convoluted structure of C++ struct-based objects discourages programmers from taking advantage of them.
Nice talking out your ass there. Ya, C++ programmers everywhere refuse to use struct-based objects (i.e. C++'s classes) and OOP because they're so discouraging.
Attention zealots and haters: 00100 00100
Programmatically creating GUI's? Ewww, why not just take another leap and go with the python bindings for libglade?
I think the Python Glade bindings require the GTK bindings.
http://outcampaign.org/
"Times faster" is a unitless quantity.
http://outcampaign.org/
I like java (alot really), but nothing beats a good scirpting language, like perl or python, to handle tasks like text manipulation.
You say that like you think Java isn't a scripting language, but an analysis of language features, like anonymous inner classes (which encourage on-the-fly, non-designed extensions to applications) clearly shows that it is more appropriate to scripting than an applications development, particularly if you care about run-time performance (yeah, I know, with the right JVM and stuff Java can go fast--so prove to me that my customers will be running that JVM and that it will support all the language features I need for the specific application I'm building--"can go fast" != "does go fast".)
Blasphemy is a human right. Blasphemophobia kills.
Python is by far the best language for proto-typing code intended to be written in C++ later. ShedSkin facilitates this process.
Not enough people proto-type their code, which is why hardly anybody talks about how to do it.
Python is a terrific prototyping language (and lots of other things besides.) As a C++ coder I've been using it for prototyping stuff that will eventually be integrated into a larger application and therefore MUST be translated to C++. So what I'd like to see is a tool (written in Perl, just for the fun of having a linguistic threesome) that just does a light gloss on Python syntax to get me most of the way to human-readable C++. That would be far more useful (to me) than thsi thing, which sounds more like f2c, whose output could case brain damage in humans and cancer in rats, or possibly the other way around.
Blasphemy is a human right. Blasphemophobia kills.
I'm amused that the same mechanism that was originally used to implement C++ (a precompiler that, in that case, generated C code) is now being used with C++ as the "low-level" language with a readily-available compiler.
Surely some of you remember cfront? Generated some truly bloated, completely unreadable stuff, but humans weren't supposed to read its output - cc was.
The preferred solution is to not have a problem.
Why not just use pure 0's?
After all, if you can express a program in binary, you can convert that binary string to a number. Just count off that many 0's (like they were tally marks).
"boo", a .NET language, allows dynamic typing by specifying 'duck' type. It achieves near-c# speed because all other data are statically typed.
.NET world.
It's a great language -- combining the benefits of Python, Ruby, and C# -- and it's wonderful for proto-typing in the
Yes they do, hence a leap further and not a leap in another direction :P
Assume that it takes:
- 4 hours to write a given program in python, 32 hours to write same program in C++
- 10 seconds to run the python program, but just 2 seconds to run the faster C++ program
- the program is run 20 times a day
- assume the developer time costs as much as the the time of the person that runs it
Ok, so it'll take 630 days of running this program for the faster C++ program to make up for the extra time to develop it. So, if you can wait two years for a payback then C++ is the way to go, otherwise code it in python.
There that was easy. Ok, any other simple problems out there? Which editor you should use? What's just the right amount of comments per program? Which is better - cvs or subversion?
Didn't you know? Common Lisp had optional static typing in 1857. Then the Civil War came and people COMPLETELY forgot about it, babbling on about their Gatling guns and submarines and stuff that Lisp had had for years.
(I'm not really sure if Lisp's static typing guarantees compile errors. I'm just a beginner Lisp Weenie. I have potential, though, right?)
I don't get why parent recommended pyGTK, but for wxPython or pyQT, you have gui builders. .ui file to python code).
WxPython has some commercial offerings, plus wxglade (it can output pure python code or xml) and boa constructor.
pyQt has Qt designer and pyuic (py user interface compiler, compiles the designers
One great example is text manipulation. A few lines of regular expression code in Perl/Python/Ruby can save hours of coding in C++ (well without a third party regular expression library). I'm not saying that this is the problem, but for complicated text parsing, the power of regular expressions is one big advantage of a scripting language over C++.
If there are no pointers or structs in the Python code, why would they have to handle them?
Interfacing between functions written in Python and functions written in C++, perhaps?
I don't think this will help with "rapid prototyping," since it has to add an additional compilation step to your change, test, change cycle.
Nor do I think most projects will be able to depend entirely upon it. Looking at the limitations listed in the announcement, it seems like it takes away many of the things that make Python engaging as a language. What seems to remain? Imagine taking a compiling C program, translating it (as literally as possible) into python, and then let this program translate it into machine code for you.
I'm probably overstating the case. There will be certain subsets of code that might benefit from this (where people need the performance, and are willing to give up architecture independence to get it). But even if this thing matures greatly (and I'd like to see that happen), I don't think it will become ubiquitous.
You want the truthiness? You can't handle the truthiness!
Maybe you should look into Pyrex, with is essentially Python with static, C primitive data types. I've never used it myself, and it's mainly intended for writing Python modules, but I see no reason why you couldn't write essentially an entire app as a Python module and call that from a very short Python script.
The ultimate plays for Madden 2006
As have I, but I'd certainly rather manage in languages that support first order data structures, "for each" loops for iterations, proper disjunctive types, pattern matching, and so on. C++ is better than it used to be, but all the data structures and algorithms in the standard library barely hold a candle to the expressive power of many functional programming and "scripting" languages.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Expressive power, or syntactic sugar? To me the former means all the things I can accomplish, not all the ways I can accomplish the same things.
Attention zealots and haters: 00100 00100
Before boost::regex was stable, ie 3 or 4 years ago, I would have agreed with you.
http://rareformnewmedia.com/
It depends upon the compiler.
CLISP ignores such statements, because it only compiles to bytecode anyway.
CMUCL and SBCL do generate compiler warnings, and sometimes even errors. Your static typechecking will not go unnoticed.
I think that to be good, the Python compiler should do the same, but that may mean introducing new statements in Python, like in Lisp :
(declaim (optimize (compilation-speed 0) (speed 3)))
I don't think expressive power and syntactic sugar are independent concepts in practice. (C.f. any language is syntactic sugar, equivalence of Turing-complete languages, everything gets turned into machine code eventually, etc.)
But the original comment was that C++ made it more difficult to use complex data structures, and given that we can express several useful concepts much more concisely using language features such as those I mentioned than is possible with C++, I think that's a fair claim.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Many language implementations for less mainstream languages compile through C, treating it as a "portable assembler", and leveraging all the work that's been done to optimize C compilation. This is even done for some high-end languages used e.g. in aerospace - Airbus jets run on a lot of generated C code, for example. C++ is less commonly used for this purpose, but if you know what you're doing, it's no slower than C.
Quite the opposite, having the intermediate step be in a higher level language tends to be very useful for debugging. You don't need to debug your C++ compiler these days, so if there's something that requires you to look at the generated output, you'd usually have to look no further than the C++.
The main disadvantage of using C or C++ as compile targets is that your compiler then depends on some other compiler, and can't work all by itself. Also, for some less standard languages, such as functional languages like Haskell and Scheme, C and C++ aren't such a good fit - but people still work around that and compile them through C in many cases anyway, for the reasons I've mentioned.
Except I have to convice my boss that Boost (sub-)libraries are (generally) solid. Normally you'd think that would be easy seeing as many Boost (sub-)libraries are getting into TR1 and are being regression tested on many platforms, and are written by some of the most brilliant C++ programmers on the planet. Unfortunaely my boss is a dumbass oldskooler who wouldn't recognize modern C++ if it raped his youngest daughter and who suffers from chronic NIH syndrome, meaning that all we get is shitty homegrown C++ libraries or glibc. With an all-batteries-included language we'd at least be spared the crappy home-grown reimplementations of the most basic stuff like regexes, shared_ptr, etc.
HAND.
So, you're suggesting that it be reworded: "a typical speedup of 2-40 times faster"? Brilliant.
http://outcampaign.org/
This won't be meaningful until a converted python script is compared to efficient code written natively in C++ in the first place.
That's the wrong comparison to make, because it assumes that the C++ programmer has unlimited time to make his C++ code efficient and correct. In real life, programmers have time constraints, and under given time constraints, the Python program will often be faster than the C++ program.
In fact, even without time constraints, C++ code often ends up far less efficient than the optimum possible, simply because using the optimal algorithm or memory management strategy is so hard in C++ that programmers can't do it.
They don't just require them, libglade bindings are part of pygtk, and as such, it's not a leap at all, and your original objection is pretty much pointless. The grand-grand-parent could've just as well meant using libglade as "programmatically creating guis".
Although if you do need to do the latter, python, with very little boilerplate required, is one of the best languages to do it in...
Doing GUI's is not the strength of any scripting language, but it depends on what you need to do.
Why is that? GUI's are, if possible, even more about using existing libraries and "backend" applications than any other software, which, as you stated, is what "scripting" languages do so well.
And to boot, GUI applications spend 99.9% of their time idle, waiting for user input, so the relative slowness also matters less than usually.
The time when Lisp dialects started using optional static typing, performance was still a big deal. Today it isn't. Plus Python has a large number of ways available to access native code with minimal fuss (mostly zero). There is simply no motivation to include that information and clutter the language anymore. In my 5 years of Python use, I never said - "Darn. This is too slow. I should have used a faster language". And I processed multi-gigabyte datasets.
How many such good VMs do you know? Java and .NET are the dominant VMs now and currently only .NET CLR has builtin support for dynamic typing.
There was an experimental feature in GCC called 'signatures' that was like "duck typing". You declared a function to take a signature, which was like a Java interface. You could then pass any type to that function provided the type provided methods of the same name and C prototype. The compiler would construct a simple wrapper class that translated calls to the signature methods to calls to the actuall object. If a type was an "almost" fit, you could provide manual translations of the signature methods as C++ code.
i can't figure out how to get python to do anything but figure pi... :( oh! but that cool calculator feature is so cool!!! thanks to python i know that 2+2/2*6+8-4+23/432 infact equals 12 :)
Help Me! I'm trapped in the tubes! Oh noes! Here comes a internet!