Slashdot Mirror


Reverse Engineering Large Software Projects?

stalebread queries: "Me and a team of other students have been tasked with reverse engineering a massive C/C++ (mostly C) computer game of about half a million lines. We have most of the source, but no clue of how to approach a task of this magnitude. Anyone have suggestions of programs, or techniques we could use to understand the structure of the game?"

11 of 104 comments (clear)

  1. Flowcharting might help by eric2hill · · Score: 3, Informative

    Since you're probably proficient with C++, try a flowcharting solution to give you a high-level map of all the classes. Maybe that will help.

    --
    LOAD "SIG",8,1
    LOADING...
    READY.
    RUN
  2. oh boy by QuantumG · · Score: 3, Informative

    I presume you mean reverse engineering in the program understanding sense. In which case the way to go about it is to sit down and read the source code, taking notes as you go. You should then set yourself some maintenance tasks - modifying the source code is the best way to find out if you understand it or not.

    --
    How we know is more important than what we know.
  3. Reverse Engineer or Refactor/Port? by linuxtelephony · · Score: 4, Interesting

    It sounds like you are wanting to refactor the code, or port it to another platform. If you are missing some of the code, then you'll have to reverse engineer that portion of it.

    As for how to approach it - I think it depends on the size of your team, and what goals you set for the effort. Are you just wanting to learn? Or do you want to improve performance? Or make it work on another platform? What are the goals for this project?

    Once you know those details, they might give you an idea where to begin.

    --
    . 62,400 repetitions make one truth -- Brave New World, Aldous Huxley
    1. Re:Reverse Engineer or Refactor/Port? by QuantumG · · Score: 5, Informative

      Yeah, the "most of the source code" part is a bit scary. If they really are talking about reverse engineering from executables they are in for a hell of a time. The state of the art is a project I work on now and then, Boomerang, and it isn't for the faint of heart. I've been hearing for years about people who are working on decompilation tools that are integrated into IDA Pro but I've yet to see it. The time where you can enter a binary, press a button and get back compilable, maintainable source code is still a long long way off. But that's good, friends of mine do commercial decompilation work.

      --
      How we know is more important than what we know.
  4. Source navigator by Mr2cents · · Score: 3, Informative

    http://sourcenav.sourceforge.net/

    I like to use it when browsing through code, you can search and browse as much as you like. It will still take an effort though.

    --
    "It's too bad that stupidity isn't painful." - Anton LaVey
  5. lots of moutain dew.... by warpSpeed · · Score: 3, Funny
    Oh, yeah, and hohos! Never underestimate the power of the hoho.

  6. Re:Legal? by Macphisto · · Score: 3, Insightful

    "Human capital"? What are you, an alien overlord of some sort?

  7. Re:WTF? by kisielk · · Score: 3, Insightful

    Just because you have the code doesn't mean you know how the system is assembled and how all the components work together. "Reverse Engineering" is a pretty loosely defined, but if you take it literally, it's just that.. reversing the engineering process. From the description of the question, the poster is looking to take the finished product (the source for this game..) and move back up the high level design phase. This means analyzing the module interconnections, class hierarchy, and that sort of stuff. It doesn't necessarily mean they want to "port" or "compile" it.

  8. Cross-reference first: Doxygen is your friend by treerex · · Score: 4, Informative

    It sounds like you are unable to build the complete system and run it, since you're missing functionality. This removes the possibility of using runtime tracing tools.

    The first thing I would do is run something like Doxygen over it to generate a cross-referenced description of the structures. It won't give you a global view of things, but it will give you a decent browsable view of the code itself. Another response mentioned GNU GLOBAL which may work better for you. Yet another possibility is LXR, though it may not work as well in C++. Regardless, a nice thing about Doxygen is that, when used with GraphViz, you can get useful diagrams generated showing class containment and file inclusion graphs.

    After you have that, get out your paper and pencil, and start drawing and manually tracing things. That's how I go about coming up to speed on new code I can't execute and step through. Eventually transfer that knowledge into a text file (or, nowadays, a wiki) so that others can benefit from it.

  9. Resources For the Code Janitor by sohp · · Score: 4, Informative

    I applaud your professor or thesis advisor or whoever for this real-world task. Here's a few resources which I wouldn't do without:
    Code Reading: The Open Source Perspective
    Object-Oriented Reengineering Patterns
    Reading Computer Programs: Instructor's Guide and Exercise
    Tips for Reading Code

  10. Reversing Std C by TheDracle · · Score: 3, Informative

    It's pretty simple, just time consuming. I've seen a few reverse engineering books floating around: "Reversing," "Exploiting Software." Since it's mostly stdC, it shouldn't be nearely as difficult to reverse engineer. Other languages can make things more complicated (Multiple calling mechanisms, more dynamic memory allocation, etc..).

    Tools:

    OllyDbg - Awesome usermode debugger, probably better suited than softice for this particular task. You can add assembly wherever you want, and it will create patches for the exe that can be automagically applied. It's also FREE.

    Numega Softice - Just in case you need to bring in the big guns.

    IDA Pro - Best reverse engineering tool available. Lots of extension scripts to do anything imaginable..

    TSearch - Can search memory at runtime, set breakpoints, disassemble code on the stack, and dynamically insert new assembly at runtime. Nice for understanding the flow of the software as it runs, and identifying interesting variables and structures.

    REC Decompiler - Awesome decompiler that produces a high level representation of the code. Not a replacement for your brain, but can save a lot of time tracing over assembly code to understand the purpose of a function.

    WinPCap & Ethereal - For reversing game protocols, and understanding client-server interaction. Sometimes it's nicer to just figure out where the host name/IP string is located in the binary and replace it with 127.0.0.1, then write a little proxy program to sit in between the client and the server.

    HVIEW: Hex editor with the ability to disassemble.

    (Use Cygwin or mingw for the following) strace: Traces signals, system calls, and spits them out to the screen.

    nm: Dump binary symbol table and names.

    I've definitely forgotten a plethora of other useful tools (especially the binutils ones), but the above consist of some of my favorites.

    For a game, you'll probably be dealing mostly with OllyDbg, HVIEW, REC, and winpcap/proxy. I'd recommend using nm to get a list of all of the symbols in the program, and then maybe split up and assign each student some number of symbols to understand and rewrite in C. Then they can use HVIEW or OllyDbg to navigate to those symbols, and try translating them. If they have a difficult time, have them use REC to get a higher level representation they can cheat off of.

    -Jason Thomas.