Slashdot Mirror


Famous Last Words: You can't decompile a C++ program

The Great Jack Schitt writes "I've always heard that you couldn't decompile a program written with C++. This article describes how to do it. It's a bit lengthy and it doesn't seem like the author usually writes in English, but it might just work (haven't tried it, but will when I have time)."

34 of 479 comments (clear)

  1. You can't by Anonymous Coward · · Score: 5, Insightful

    Information is lost in compilation. You can never reconstruct the exact original source. You end up with valid C++ that has no more human-understandable information than the equivilent machine code.

    Like turning hamburgers into cows...

    1. Re:You can't by Morologous · · Score: 5, Funny

      Like turning hamburgers into cows...

      I'm going to use that line.
    2. Re:You can't by NewbieProgrammerMan · · Score: 5, Funny

      Heh. You're assuming that you're attempting to decompile something that had human-understandable source to start with. :)

      --
      [b.belong('us') for b in bases if b.owner() == 'you']
    3. Re:You can't by cperciva · · Score: 5, Funny

      We're talking about C++ here, not perl.

      Compiled C++ code can't be decompiled into anything approximating the readability of the original; compiled perl code can.

    4. Re:You can't by antis0c · · Score: 4, Informative

      What's to say you need something as readable as the original? I worked at InterAct Accessories/GameShark for a few years before they went under as essentially a 'reverse engineer'. Without getting yet another CND from them in the mail due to a post on Slashdot (I don't even think they could send one now they're out of business?), all I can say is sometimes when hacking a game it benefits an engineer to decompile the application and be able to set breakpoints and watch execution flow while the game is running on for example a PlayStation 2. Sure it's going to be a lot of nearly unreadable C++ mixed with Assembly, but if you can watch the execution flow as you do something, it can be useful.

      Of course a lot of naive people think decompiling would allow you to take an application and start writing patches for it, in that case you are right, it's going to be pretty useless. However it's not entirely useless for all situations. I'm sure the WINE guys might get some use out of it.

      --

      ..There's a-dooin's a-transpirin'
    5. Re:You can't by capnjack41 · · Score: 4, Insightful
      And then on top of that, the compiler optimizes that code, so calculations are no longer the straightforward and intuitive things they used to be, now they're a series of out-of-order, smaller calculations that are harder to recognize. They're efficient as hell but barely reversible.

      I'll RTFA when it comes back to life :).

    6. Re:You can't by Paradise+Pete · · Score: 5, Funny
      I'd prefer... Like turning shit backing into pizza.

      Clearly you haven't tried Domino's.

    7. Re:You can't by Waffle+Iron · · Score: 4, Funny
      Like turning shit backing into pizza

      Here's how:

      Flush shit down toilet -> let shit mellow at sewage plant -> strain shit residue out of bottom of sewage vat -> haul to field -> spread on grass -> grass grows -> cow eats grass -> pull cow's udder, direct milk into bucket -> ferment milk to cheese -> shred cheese -> spread on dough -> Pizza!

    8. Re:You can't by jkorty · · Score: 4, Insightful
      Information is lost in compilation. You can never reconstruct the exact original source

      So what? Doing reasonable interpolations in context is what brains are for. Example: IIRC, when the Morris Worm appeared in 1989, Gene Spafford examined the binary and reverse-engineered the C code, sprinkling it with meaningful comments and good variable and function names. When the original source became available, his turned out to be cleaner program than the original. That is, he not only recreated the original in every way that counts, he overshot and did better than the original

  2. Oop by Suffering+Bastard · · Score: 5, Funny

    it doesn't seem like the author usually writes in English

    Surely he now understands the English infinitive "to be Slashdotted".

    --
    "Molest me not with this pocket calculator stuff."
    - Deep Thought
  3. Why not? by bazik · · Score: 5, Insightful

    I've always heard that you couldn't decompile a program written with C++.

    Well, you can decompile every binary programm at least to assembler code, so why shouldnt it possible with C++?

    Maybe he ment "you can't decipher the source of a C++ programm" ;)

    --


    --
    One by one the penguins steal my sanity...
  4. hmm by Graspee_Leemoor · · Score: 5, Informative

    A c/c++ decompiler that totally worked would be the Holy Grail of crackers. Unfortunately it is actually impossible to get everything back because lots of info is lost on compilation.

    Nevertheless there are tools out there that attempt to decompile programs; I think of them more as ways of making assembly more readable.

    Note, a lot of them wouldn't work on hand-written assembly, because they rely on knowledge of how certain compilers compile various things- e.g. there was a Delphi decompile available.

    graspee

    1. Re:hmm by jackb_guppy · · Score: 5, Interesting

      I wrote reverse compilers on IBM midrange equipment. where there are not stacks and self modifing code is VERY commom place. It is easy to do:

      Create a program that preforms / understands the opcodes for the processor and addressing. And it follows both sides of a branch.

      Now "run" the program, that maps out the all opcode and data areas.

      Once done. Look at that Assemmebler equivatlent, map out commom subroutines and function calls. Data Storage become very clear. Lastly, commom storage with show external and internal common structures - so naming of fields and visualable.

      It is striaght forward, can be time comsuming - and very helpful is understnad hinden or loss information.

  5. sure you can go from asm - c++ by Anonymous Coward · · Score: 5, Informative

    but it'll look like this

    class a
    {
    public:
    void b(int c);
    void d(int e);
    private:
    int g;
    int h;
    };

    int main()
    {
    a f;
    f.b(23);

    int x; x=0; x++;
    if(x > 3) goto j;
    f.d(x); x++
    if(x > 3) goto j;
    f.d(x); x++;
    if(x > 3) goto j;
    f.d(x);
    j: f.b(42);

    return 0;
    }

    1. Re:sure you can go from asm - c++ by rsheridan6 · · Score: 4, Funny

      My girlfriend just read that over my shoulder and said "Is that a poem?"

      --
      Don't drop the soap, Tommy!
  6. Re:Why by Anonymous Coward · · Score: 5, Insightful

    You need reasons?

    1) Finding backdoors
    2) Testing security
    3) Fixing bugs
    4) Adding features
    5) Discovering copyright violations
    6) Interfacing to non-supported clients

    Pretty much anything and everything you would do if you had the source.

  7. Re:Why by p4ul13 · · Score: 4, Insightful

    You could be updating a program for your company for which the source is lost.

    --
    Paul Lenhart writes words!
  8. Inline functions, templates and decompilation by truth_revealed · · Score: 4, Insightful

    Sure you can decompile an optimized and symbol-stripped C++ program, but you'd never have it the original compact form of the source as you do with the Java class file decompilers due to the heavy use of inline functions and templates used in C++. A C program, sure, but decompiling C++ is not terribly useful.

  9. let's get back to basics by 1nv4d3r · · Score: 5, Funny

    Hell, I'd be happy if the people working for me could consistently compile their c/c++. I need a new job...

  10. Spectulation Code by Davak · · Score: 5, Informative
    Considering the entire post is evidently based on speculation...

    Here is some code that supposedly decomplies... not that I've tried it.

    Quote from the FAQ:


    [35.4] How can I decompile an executable program back into C++ source code?

    You gotta be kidding, right?

    Here are a few of the many reasons this is not even remotely feasible:
    * What makes you think the program was written in C++ to begin with?
    * Even if you are sure it was originally written (at least partially) in C++,
    which one of the gazillion C++ compilers produced it?
    * Even if you know the compiler, which particular version of the compiler was
    used?
    * Even if you know the compiler's manufacturer and version number, what
    compile-time options were used?
    * Even if you know the compiler's manufacturer and version number and
    compile-time options, what third party libraries were linked-in, and what
    was their version?
    * Even if you know all that stuff, most executables have had their debugging
    information stripped out, so the resulting decompiled code will be totally
    unreadable.
    * Even if you know everything about the compiler, manufacturer, version
    number, compile-time options, third party libraries, and debugging
    information, the cost of writing a decompiler that works with even one
    particular compiler and has even a modest success rate at generating code
    would be significant -- on the par with writing the compiler itself from
    scratch.

    But the biggest question is not how you can decompile someone's code, but why
    do you want to do this? If you're trying to reverse-engineer someone else's
    code, shame on you; go find honest work. If you're trying to recover from
    losing your own source, the best suggestion I have is to make better backups
    next time.

    I would have posted AC but that have me blocked out for some reason...


    Davak

  11. To all those, who think it's useless... by SharpFang · · Score: 4, Interesting

    Well, it isn't. Sure, if you're so lazy uou want to have source rebuilt from binaries with one click, complete with comments, makefile and documentation, that's of no use. But imagine the program does some very clever trick. Something you ooh about, "How the hell does he do that? It's impossible?". You want to include that trick in your code. You need it. So - you have three options: 1) Try to design it from scratch. Helluva work, you don't know where to start. 2) Look into the binary. If you're ASM guru, you MAY succeed. But ASM from high-level languages is hell to read. 3) Decompile the puppy, look for that piece through what looks like piles of junk, but is way more readable than ASM and find it. Then just rewrite it in pretty fashion, changing variable names and functions to your needs and include in your own software. It's "the best of the worst", last resort at finding a solution to a small problem. Not a way to edit the source and add a single feature to the original program, like remove print protection from Acrobat Reader. The decompiled program most probably won't be possible to compile. You won't make a cow from hamburgers. But with some luck you may find out the cow was a bull and got killed by a truck.

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  12. You're right, that is nonsense. by Anonymous Coward · · Score: 5, Funny

    I damn well know computers. I have been working with them since 1904, when the Black Man made the first computer out of a peanut. I now work for Cray research making 18 figures.

    I can scratch a superscalar CPU out of silicon with a pocket knife. I even have friends who can write major programs in binary code (yes, just 1s and 0s)... even though writing a simple "hello world" program can ammount to 92,752 bits. I fail to realize that this ability does not a good computer scientist make. Things like intelligent design and research make a CS good.

    The parent post is fluff. It's stupid, the man is flamboyant and exagerating. He clearly has no real education of computer engineering and does not recognize that any executable code can be reverse-engineered or decompiled. Especially since every langage (save interpreted languages like Java) are compiled to machine code -- specific, unambiguous, structured code. "Decompiling" this is only really a matter of translating it into your langauge of choice.

    So, Mr. Proud American, please get off your imaginary high horse. You're not fooling anyone.

  13. Templates by ucblockhead · · Score: 4, Informative
    He won't be able to regenerate any templates. If a program makes heavy use of templates, the "C++" he "decompiles" to is going to be hideously ugly.

    [insert joke about it being hideously ugly with templates here.]

    {I did not read the article itself because it is, of course, slashdotted)

    --
    The cake is a pie
  14. Reverse engineering has its uses... by sheetsda · · Score: 4, Insightful

    There seem to be a lot of people in this story saying "shame on you for reverse engineering". It has its uses, how else would viruses, worms, and trojans be analyzed to figure out what they do and how they do it.

  15. Re:Why by Call+Me+Black+Cloud · · Score: 4, Informative
    As a Java programmer I find it very useful to decompile class files from time to time. Reasons I've done so:

    A library we were basing a major portion of our code on had a bug in it (a Listener class failed to implement EventListener if I remember correctly) which kept our code from working. Removed offending classes from archive, decompiled, fixed, and recompiled.

    It's educational...the ol' "how'd they do that?". I've never taken code and used it but I found it instructional to look at how someone made a Swing text area from scratch, e.g.

    The challenge...one program I installed had a "enter registration key" and I was curious how that was handled (turned out to be a static string). Then there was this applet that was the the core of a company's business. Free, or pay and get more features. As it turns out the control of the features all resided in the applet, so change a couple of switch and if/then statements and voila, administrative privleges. Didn't use it for evil, much... :) They've since come out with a new version and I've been too busy using my mad java skillz on contract work to take a look at their code.

    Looking at security was instructional too, though, for when I was project lead on a commercial Java app I knew what worked and what didn't (we ended up using the Wibu key).

  16. A good decompiler shows you what was written by crovira · · Score: 4, Interesting

    not the source's lies.

    Losing source code and var names (name spaced globals aka statics and scoped locals) allows the cracker (these are rarely hacking tools, they're mostly cracking tools,) to focus on what the machine actually was told to do instead of smothering it with shades of meaning which interfere with understanding the code.

    C++ or Java or Smalltalk, or almost any highly structured language using machine code libraries or virtual machines result in structured blocks of code and heap and stack allocation.

    A good decompiler can take the machine code, peel away the name spaces and code calls, extract the patterns in the code and the hacker/cracker can read the patterns instead of wasting time on the code.

    Forensic analysis work is extremely useful at telling you what happened when something dies but it is no good at telling you how something worked. For that you need code traces.

    Map those code traces onto the structure the decompiler reveals and you understand the program better than the authors/coders.

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
  17. Anyone want to decompile SCO? by pchown · · Score: 4, Funny
    You might decompile one file and find a comment like this at the top:

    * This program is free software; you can redistribute it and/or
    * modify it under the terms of the GNU General Public License
    * as published by the Free Software Foundation; either version
    * 2 of the License, or (at your option) any later version.
    ;-)
  18. misleading... by bismarck2 · · Score: 4, Informative

    Even with complete original source code, understanding a non-trivial C++ application is very difficult. Source derived from an optimized executable is going to be a LOT rougher. No real function names, module names, variable names, or comments. Use of standard libraries (STL, MFC, Boost) is likely highly obscured as well. A tool like this would probably produce source that looks more like a C/machine language hybrid rather than normal C++. The primary use of something like this is if you are looking for a very specific piece of logic such as a password check or an encryption operation or protocol details. When were these famous last words anyway?

  19. Re:Why by Lumpy · · Score: 4, Insightful

    Why would you want to do this unless you were stealing source?


    nice try.

    You must be either Bill Gates, Steve Ballmer or someone who works for the BSA.

    How am I to tell if your close source program isn't full of my GPL code that you blatently stole and are trying to rob me blind by STEALING my IP? Being a closed source advocate as you seem to be you are for me trying to detect IP theft and the illegal STEALING of my code by PIRATES right?

    Ok, I'm going overboard to make my point... I have EVERY right to use tools in a good and legal way. Why not outlaw hammers as anyone can perform a very grisly and horrible murder with one... Or better yet only allow licensed contractors to have hammers! as we know that the unlicensed public is only going to do very ewvil things with tools!

    see my point now? A tool is exactly what it looks like.... a tool. it can be used for good and evil. and I dont have any respect for the self righteous like you condemning what I do before I even do it.

    people with attitudes like you are what cause all the pain and suffering in this world...... STOP IT!

    --
    Do not look at laser with remaining good eye.
  20. I couldn't help it by Fnkmaster · · Score: 4, Funny
    Neo: Do you always look at it in binary?


    Cypher: Well you have to. The compilers work for the construct program. But there's way too much information to decode the Matrix. You get used to it. I...I don't even see the code. All I see is an array, function pointer, integer. Hey, you uh... want a drink?


    Neo: Sure.


    Cypher: You know, I know what you're thinking, because right now I'm thinking the same thing. Actually, I've been thinking it ever since I got here. Why, oh why didn't I sell my VA Linux stock?... Good shit, huh? Cowboy Neal makes it. It's good for two things, degreasing Perl code and killing brain cells.

  21. Decompiling is possible, but hard by Animats · · Score: 4, Interesting
    Decompilers are rare, but possible. The first good one, decades ago, decompiled IBM 1401 assembler programs into COBOL. There's a commercial business, The Source Recovery Company, still doing that for legacy mainframe programs.

    C decompilers exist; here's one. There are others. Most aren't very good. It's a hard problem.

    Without debugging information, decompilation tends to result in code with arbitrary variable and function names, of course. But you get names when a DLL or .so is entered, so at least you get the program's major interfaces. Minimal C++ decompilation could be done by adding vtable recognition to a C decompiler.

    A more difficult problem is recognition of idioms. Things like "for" statements tend to decompile as lower level constructs. That's OK as a first step. You need some internal representation Initial decompilation might represent all transfers of control with "goto"; higher level recognition then deals with that.

    The key to doing a good job is "optimization", finding more concise source code that will generate the object code. The key to this problem is defining an internal representation that can represent any valid machine-language program, and which can be modified as higher level information about the program is discovered. The first step is usually to start at the starting address and build a code tree by following calls, like a good debugger does. Then you start to improve on the code tree, doing things like this:

    • Recognition of function calls. Each function call should be decompiled, and all calls to the same function checked to insure they have the same calling sequence. Then a prototype can be generated and placed in a header file.
    • Recognition of fixed-format structures. Figuring out how big a structure is can be tough, but at least fixed-format ones should be fully recognized. All references to the structure should be checked for type consistency, and a structure definition generated.
    • Recognition of "for", "while", and "switch".
    • Once constructors and destructors have been found, the structure of derived objects can be figured out. Now class definitions can be generated.
    • Once class member functions have been identified, the most restrictive protection ("private", "public", "protected") that will work should be attached. Similarly, "const" can be inserted for all arguments not seen to be modified.

    Decompilation won't always succeed. But you should find all the places where the code is doing something the compiler doesn't understand, and get code back for everything else.

    It's a big job, and somebody ought to do it. Among other things, it would be a valuable tool for finding compiler bugs.

  22. Re:Why by Dylan+Zimmerman · · Score: 4, Interesting

    Nope. It (probably) wouldn't be admissible because of the part that says no reverse compiling. Reverse engineering is something totally different.

    Reverse engineering is taking a black box and figuring out what it contains by giving it test inputs and watching the outputs. There are a few other things considered reverse engineering, but that describes most of it.

    Of course, all of this ignores the fact that EULAs have never been tested in court. They could be proven invalid as contracts fairly easily since the exchange of goods occurs before you ever see the EULA and most stores don't accept returns of opened software. Therefore, if you don't agree to the EULA, you still have the right to use what you purchased.

    On an interesting side note, various free trade laws specifically protect reverse engineering.

  23. From the author by opcodevoid · · Score: 5, Interesting
    I didn't relize my artical was getting any feedback because people are posting it here instead of pscode.

    Anyway i seen alot of people saying decompiling is impossible or at least not practical, well that is not true. Decompiling c++ is very practical because of high level keywords(if,while,for) ,local variables, and parameters. All of these generate certain instruction similer on every platform and just about every proccesser.

    I also extending the artical to contain 92 pages in total which will cover OOP, and crt, and a whole bunch of other stuff

  24. Re:Decompilation = halting problem == boloney by jackb_guppy · · Score: 4, Informative

    Been doing it for twenty years. It is easy to do.

    Stop trying to use logic... actually do it.