Python-to-C++ Compiler
Mark Dufour writes "Shed Skin is an experimental Python-to-C++ compiler. It accepts pure, but implicitly statically typed, Python programs, and generates optimized C++ code. This means that, in combination with a C++ compiler, it allows for translation of pure Python programs into highly efficient machine language. For a set of 16 non-trivial test programs, measurements show a typical speedup of 2-40 over Psyco, about 12 on average, and 2-220 over CPython, about 45 on average. Shed Skin also outputs annotated source code."
This is an interesting development. Programmers can now use the simple syntax of python and create faster machine code. THis may make rapid prototyping even more rapid, and allow programmers with little or no C++ experience create code that will run faster and will not require someone to install Python to run something. I will have to explore it more, but it will be intriguing to see how they handle things like pointers and structs that are not in python.
My
Until he addresses mixed types in n-tuples, this won't be useful for very many people.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Why not just use pure C++?
2x to 40x speedup? 2% to 40%? 2 to 40 seconds?
Standardised units are your friend.
Will program for karma.
As a UNIX admin, I was saddled with one of these kinds of things years ago, a DEC-BASIC to C compiler for UNIX. The output code quality was incredibly bad: machine generated variable and function names, bizarro nested struct/union/struct data structures, 400-line functions peppered with calls to 1-line functions. Completely unreadable. Thank $DEITY that project died quickly.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
If you're only allowed to use static typing, doesn't that defeat the purpose of coding in python in the first place?
See, it's all well and good to compile python to speed it up. The problem is, people are now saying that they can write efficient code in python just because it magically translates to C++, and because this translator is faster than other python compilers.
This won't be meaningful until a converted python script is compared to efficient code written natively in C++ in the first place.
StoneCypher is Full of BS
Since Python is about the only language I know very well, I find this fascinating. But it also reminds me of the .NET development suite, where the way you write your code in any language doesn't matter, since it all become one machine code. So if you think you can do something more memory efficient in C# than in VB.NET - well, no.
So the bottom line is, the quality will depend on the quality of the converter, and that's not so cool. Adding layers between code and machine code is not the best way IMHO.
This is a good step to make Python run a bit faster, but I don't think it'll really make a huge difference.
The best way to get some speed and still keep the nice Python functions and layout is just to export the most heavily used functions to native code (C/C++).
I don't know if its possible to take the C++ output and optimize it seperatly, that way you will have a good start to make native code though.
In short: Better, fast and easy, but not the best (if you can write native code)
My blog: http://www.redcode.nl
Among python programmers, I'm curious - how many use psyco (another python performance enhancement tool) for their projects? I fiddled with it a while ago (it didn't work because of a C module that it didn't like), but never had a compelling reason to go back to it. Performance optimization has never been important enough for my applications to merit the effort.
It's not wasting time, I'm educating myself.
...kind of reminds me of the Google Web Toolkit which is more or less a Java to Javascript/HTML compiler. It's not an optimization thing like ShedSkin, instead it lets folks use the Java skills they already have to write better web apps. I wonder what they use to parse the Java code? I don't see any mention of JavaCC on their site, or ANTLR either for that matter...
The Army reading list
PyPy
surely the best way to speed it up is to compile it straight to object code... c++ has to be compiled and just adds an intermediate step which will make things harder to debug...
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
...why not make it into a GCC frontend so Python can be compiled directly?
Why? Read the linked page? Says it all. Violates most any Python code of any complexity out there. So if it doesn't convert Python code from the real world, what is it for? Making Python coders learn enough about C++ to remember the limitations and write/rewrite Python code to use it?
What the Python C/C++ interested people REALLY need is a book written by a group of Python AND C/C++ masters which teaches the two simultaneously showing complimentary methods of doing any given thing working from beginner to advanced and I DON'T mean "How to turn your n00b Python code into C/C++ hotness" sort of viewpoint. I mean both taught simultaneously in synch showing how they can interchange and compliment.
Software tricks for converting? Ultimately worse than not having them because it leads to horrible obfuscation because we don't know exactly what is going on when 13,412 lines of Python is turned into C++ because WE DIDN'T WRITE IT AND WE NEVER LEARNED C/C++. "Say Mike, that's great but you're the company code cowboy and you don't do C++ natively and I sure as hell don't read it being management so exactly what happens if this needs to be fixed? We've gone from importing open source code you couldn't read to writing our own open source code you can't read."
If my grammar and spelling are off, I am [distracted/tired/careless] (take your pick)
Somebody had to say it.
Writing a program in C++ and compiling it with a C++ compiler is like writing poetry in German and using babelfish to convert it to English.
Writing a program in Python, then converting it to C++, then compiling with a C++ compiler is like writing poetry in French, then using babelfish to convert it to German, then using babelfish again to convert it to English.
If it's not obvious, then the fewer levels of translation you have, the better your output will be: "The vodka is good, but the meat is terrible."
As another poster already said, file I/O is a bottleneck regardless of ANY language. So, try something different. Real-time h264 decoding for example.
This sig does not contain any SCO code.
Python is by far the best language for proto-typing code intended to be written in C++ later. ShedSkin facilitates this process.
Not enough people proto-type their code, which is why hardly anybody talks about how to do it.
Writing a interpreted language to compiled language converter is a fairly simple step that can yield an impressive speedup with very little effort. However, doing so makes you dependent on a separate compiler, and it can also force you to make unwise decisions when you attempt to coerce dynamic language features into a static language. Clearly the "Shed Skin" compiler is in its early stages, and its author is currently favoring the removal of Python features in order to accomplish his goal. That's a shame, because it need not be so. Hopefully his later attempts will support the full range of Python syntax and semantics.
A couple years I wrote a specification for a dynamically typed language with first order functions. Over a period of 3-4 months, I wrote an the interpreter in C++, and then as an experiment I made a bytecode-to-C++ converter. That first prototype was about 3x faster than the already blazing fast interpreter, but I didn't want to have to depend on gcc, so my next step was to write a bytecode-to-x86 converter. That allowed the intepreter to compile down to raw machine code in the same pass as the bytecode verification. With the JIT in place, its performance compared favorably to an unoptimized C compiler, even for tight loops and recursive function call benchmarks like the Ackermann function. Keep in mind that we're talking about dynamically resolved types as loop counters and first class function objects that carry writable closure state from their parent functions, so this is quite a bit more impressive than it might seem at first glance.
At the time, my little unoptimized JIT compiler would have easily made the top 10 on the language shootout page. With the benchmarks normalized to show C, C++ and Ocaml as 1.0, my simple JIT compiler was about a 4-5, considerably faster than my interpreter which ran at about 30-40. Meanwhile, all the other interpreters including perl, python, ruby, lua came in at about 60-200 (*note: my first attempt at the interpreter ran in the 150 range -- about the same as perl).
My experience convinced me that it should be fairly easy (read: only a couple months of work) to speed up most interpeted languages by at least a factor of 10. For some of the dogs out there, you could probably make it closer to a 40x speedup. Granted, it would take a few days to a week to write the machine code driver for each new platform; however, the benefit of not having to run a separate (slow) compilation phase would outweigh that negative, IMHO.
Unfortunately though, some languages constructs are just inherently slow. For example, perl and python both treat their objects as string-based lookup tables, so any code that uses that functionality will be limited by the access speed of the lookup table -- potentially close to O(1) in the average case but no better than O(lgN + length of string) in the worst case. As another example, perl uses a list construction for its function arguments instead of a vector. List allocations require significantly more memory than arrays, and their indexed lookup time is O(N) instead of O(1).
Finally, as the "Shed Skin" compiler author realized (and dutifilly noted that his implementation ignored), there are serious performance penalties that come with allowing container objects to hold dynamic types. These penalties really can't be worked around unless you add optional static typing or simply ignore the feature. I prefer optional static typing, because dynamic typing is way too valuable of a tool to remove from a language.
Python is a terrific prototyping language (and lots of other things besides.) As a C++ coder I've been using it for prototyping stuff that will eventually be integrated into a larger application and therefore MUST be translated to C++. So what I'd like to see is a tool (written in Perl, just for the fun of having a linguistic threesome) that just does a light gloss on Python syntax to get me most of the way to human-readable C++. That would be far more useful (to me) than thsi thing, which sounds more like f2c, whose output could case brain damage in humans and cancer in rats, or possibly the other way around.
Blasphemy is a human right. Blasphemophobia kills.
1) Download Bittorrent
2) Download Shed Skin
3) ???
4) Profit!
-jX
Don't you just love politics? It's like a comedy of errors.
I'm amused that the same mechanism that was originally used to implement C++ (a precompiler that, in that case, generated C code) is now being used with C++ as the "low-level" language with a readily-available compiler.
Surely some of you remember cfront? Generated some truly bloated, completely unreadable stuff, but humans weren't supposed to read its output - cc was.
The preferred solution is to not have a problem.
Assume that it takes:
- 4 hours to write a given program in python, 32 hours to write same program in C++
- 10 seconds to run the python program, but just 2 seconds to run the faster C++ program
- the program is run 20 times a day
- assume the developer time costs as much as the the time of the person that runs it
Ok, so it'll take 630 days of running this program for the faster C++ program to make up for the extra time to develop it. So, if you can wait two years for a payback then C++ is the way to go, otherwise code it in python.
There that was easy. Ok, any other simple problems out there? Which editor you should use? What's just the right amount of comments per program? Which is better - cvs or subversion?
If there are no pointers or structs in the Python code, why would they have to handle them?
Interfacing between functions written in Python and functions written in C++, perhaps?
Many language implementations for less mainstream languages compile through C, treating it as a "portable assembler", and leveraging all the work that's been done to optimize C compilation. This is even done for some high-end languages used e.g. in aerospace - Airbus jets run on a lot of generated C code, for example. C++ is less commonly used for this purpose, but if you know what you're doing, it's no slower than C.
Quite the opposite, having the intermediate step be in a higher level language tends to be very useful for debugging. You don't need to debug your C++ compiler these days, so if there's something that requires you to look at the generated output, you'd usually have to look no further than the C++.
The main disadvantage of using C or C++ as compile targets is that your compiler then depends on some other compiler, and can't work all by itself. Also, for some less standard languages, such as functional languages like Haskell and Scheme, C and C++ aren't such a good fit - but people still work around that and compile them through C in many cases anyway, for the reasons I've mentioned.
Except I have to convice my boss that Boost (sub-)libraries are (generally) solid. Normally you'd think that would be easy seeing as many Boost (sub-)libraries are getting into TR1 and are being regression tested on many platforms, and are written by some of the most brilliant C++ programmers on the planet. Unfortunaely my boss is a dumbass oldskooler who wouldn't recognize modern C++ if it raped his youngest daughter and who suffers from chronic NIH syndrome, meaning that all we get is shitty homegrown C++ libraries or glibc. With an all-batteries-included language we'd at least be spared the crappy home-grown reimplementations of the most basic stuff like regexes, shared_ptr, etc.
HAND.
This won't be meaningful until a converted python script is compared to efficient code written natively in C++ in the first place.
That's the wrong comparison to make, because it assumes that the C++ programmer has unlimited time to make his C++ code efficient and correct. In real life, programmers have time constraints, and under given time constraints, the Python program will often be faster than the C++ program.
In fact, even without time constraints, C++ code often ends up far less efficient than the optimum possible, simply because using the optimal algorithm or memory management strategy is so hard in C++ that programmers can't do it.
There was an experimental feature in GCC called 'signatures' that was like "duck typing". You declared a function to take a signature, which was like a Java interface. You could then pass any type to that function provided the type provided methods of the same name and C prototype. The compiler would construct a simple wrapper class that translated calls to the signature methods to calls to the actuall object. If a type was an "almost" fit, you could provide manual translations of the signature methods as C++ code.
i can't figure out how to get python to do anything but figure pi... :( oh! but that cool calculator feature is so cool!!! thanks to python i know that 2+2/2*6+8-4+23/432 infact equals 12 :)
Help Me! I'm trapped in the tubes! Oh noes! Here comes a internet!