Ask Slashdot: How To Start Reading Other's Code?
BorgeStrand writes "I'm reviving an open source project and need to read up on a lot of existing code written by others. What are your tricks for quickly getting to grips with code written by others? The project is written in C++ using several APIs which are unknown to me. I know embedded C pretty well, so both the syntax, the APIs and the general functionality are things I wish to explore before I can contribute to the project."
If there's a lot of documentation, interpret it like your favorite religious text. Try to hit up some of the old developers from the VCS. Also, I'd like to help :)
Knowing the data structures gives you the ground work for understanding what the code is doing. The data structures are a more direct description of the design decisions.
What are your tricks for quickly getting to grips with code written by others?
For me, it comes down to a lot of mountain dew, techno music, and hours of guru meditation. As you dissect each function, sketch out its relationship to other major functions. I take a two pass approach .. first, just look at the function call outs and the return values and make a rough sketch of the 'scaffolding' of a program. On the second pass, any function that you can't see the obvious application of, or appears obfusciated or complicated, dissect into functional units and sketch out what it does in your notes. I do this by actually physically drawing the relationships using something called a mind map.
Until you get used to it, actually writing it down, even if it's just a bunch of messy arrows to blobs of circled text... it will help job your memory and help things sink in until you have the necessary 'ah ha!' moment.
YMMV.
#fuckbeta #iamslashdot #dicemustdie
Find a good editor or IDE that allows you to quickly navigate the code.
Figure out how to build it
Figure out how to test it
Read the API docs. Understand the objects and how to interact with their interfaces.
YMMV, depends on how smart you are I guess.
Look straight at the code for a few hours without moving an inch. After that its details should be printed into your brain.
If possible, I would try writing unit tests for the existing code. This tests your understanding of what you are reading and will come in handy later if you change the code. If unit tests already exist then I suggest that you read them since they will tell you the intention of each function.
Everything else will seem simple after that...
Any insufficiently advanced magic is indistinguishable from technology.
They will be doing horrible, horrible things from your point of view. Manage your mood, to be able to stick to the goal.
1) Start by understanding completely the target language, in this case C++. 2) Get a IDE that is able to provide full support for finding references and definitions in C++. For now only Netbeans CND is able to proper do that for Linux-based OS. Alternative for WindowsOS use Vision Studio. 3) Then you just have to start reading the code and build the mental map of the project.
Read as much code of as many different styles as you can. Eventually, you will hopefully start just getting a feel for programming much like music, where different code has different styles. You'll see manifestations of different patterns and start to gain a deeper understanding of it (it can have an almost zen-like quality to it)...why certain patterns get used, why certain developers use patterns that don't make sense to you, what kind of developers the authors are. It took me years before I started to get a feel for reading other people's code, but I have an idea as to how good a programmer is and even what kind of everyday personality they have from reading their code. It really is like music. And one more thing: don't count on comments for guiding you through this; it's rare to come across well-commented code, especially in a professional environment (ironically enough).
Even without Doxygen's specific format for comments, you can use it to graph object relationships, call-trees, etc.
You can generate docs limited to a few files or classes if you just want to focus on them.
www.doxygen.org
First, figure out how the code gets loaded and runs. Find the equivalent of the 'main' function. Then start tracing it, seeing what functions get called, how things are loaded, etc. What really helps here is an editor you can CTRL+click on a function on to go to its definition. When you hit a function that doesn't call any other unknown functions, then you can start understanding what it does without having to step into it. These are the basic functionality units. Then when you know enough of those, you can start going a level up, etc. Eventually a picture forms in your mind of just how things work. You can optionally skip over functions for preference of looking at them later if it seems pretty clear what they do based on the name & how they're called, but you might find important stuff in there later. This is how I go about it, anyway. It can be very frustrating and very confusing at first, but eventually the picture starts making sense, then things click in a most satisfying manner. That being said, the above is also the reason I can dislike complicated frameworks. There's so much indirection that it can take quite a while indeed until you hit something concrete. The mark of a good framework is, either it doesn't do that, or it does but soon enough you figure out its parts and then you can treat it intuitively.
I find that going through some key functions (assuming you can find them) and reformatting them to your own liking can be helpful, commenting code along the way. Then if you want to get more aggressive, start cleaning up some code in minor ways that still stay true to the function's meaning. After you've done a bit of that, you should probably have at least a vague idea what's going on.
People who say "sheeple" have about as much sophistication as an AOL user, and in fact are probably actually AOL users.
I disagree with reading too much code.
Run the code to see what it does. Add some printfs to validate the understanding of what the code is doing.
It won't help you understand the code but you'll stop worrying about it so much.
I've found the best way to learn a code base is to start editing it. There are almost always formatting details and names that don't feel right. Start changing those and see what happens. The process will force you to understand the logic and help you really understand the code. Even if you throw all your formatting changes away, you'll definitely learn something.
Hopefully, the project has tests as well that will let you see if your formatting changes break things, in which case you've begun to understand the relationship between different code blocks.
I recommend starting by working on the bug list. It gives you something to work on constructively and it also makes you look through all the code to track the problem.
As you are doing this, start generating you own documentation. If the code doesn't use DOXYGEN, add that. Reformat and add comments. Write external documentation. When you are documenting, think of what you wish the previous coders had done for you, and then do that.
This is the way I write code from the beginning, and it leads to better code. If I can explain what is going on then I know I understand it. If I can't explain it, then there is a pretty good chance that there are bugs. It's good practice whether you are taking over someone else's code or starting from scratch.
Why is Snark Required?
The trouble with university education, is that most people who teach there are computer scientists, not software engineers with years of experience in the trenches.
If this were actually the case, there would be a recognition that reading code is far harder than writing it. And far more emphasis would be on coming to grips, understanding, and working on large code bases. There'd be more stuff on things like unit testing, breaking dependencies, troubleshooting, and refactoring at least.
Find out what drugs the original coder was using when writing, and take the same.
Find a function. Refactor it until you grok it. Discard the results.
Keep in mind that it will be VERY tempting to commit your changes, but you must throw away the work and chalk it up as a learning experience if you ever want to be taken seriously by the others who work on the project. Junior developers (and even some senior developers) often think they're doing everyone a favor by doing drive-by refactors, but they're not; they're just slowing down the entire team and coming across as that a**hole who keeps f***ing up the diffs and destroying the useful output of tools like git blame.
If you found any bugs in the previous step, make a patch with the absolute minimal change to fix each individual bug. IMPORTANT: Before committing the patch, first be sure that you can reproduce it in the old code, and that the test case is fixed by your new code.
Repeat the process until you understand the entire system.
With any luck, you will finish with a solid understanding of how the code actually works, and you will most likely also fix a few dozen bugs (if you didn't find at least one bug per kLOC, then "you're doing it wrong" or the code was written by an inspired genius with OCD). At that point, you will be the team's expert on how things work, and you will be in a position where you can start proposing simple refactorings that will improve the code quality.
You mentioned you have embedded C experience and the code of interest is written in C++. You didn't mention if you had any C++ or other object-oriented programming experience. I assume the C++ code uses the OO features of C++ that distinguish it from C -- but this assumption is not necessarily true.
So, if you lack OO experience and the code is truly OO C++ code, you might want to do a little reading up on the basics of OOP in order to spend less time spinning your wheels.
I am not a crackpot.
Here is how I work on legacy code:
1) I don't look at the whole picture because there are too much details, so I prefer to attack little by little.
2) I quickly check what I can rewrite in order to optimize the code. If I have no idea, I run a profiler, and take a look at the routines that take the most time.
3) once I understood or rewrote the most consuming parts (sometimes it's heavily optimized, but most of the time, I can make a real improvement), I decide what most important functionality I would like to add, and I just focus on that.
4) if I really need to have robust code, I write tests for the routines before optimizing them, so that I can validate if there are regressions
5) whenever possible, I use "assert" and put some bound-checking tests, in order to validate the ranges of certain values or conditions.
The important thing is to start by taking ownership of a small part of the code, then a bigger part, etc... ... ?
Take one slice at a time, not the whole pie.
And one last point: knowing every little detail is useless, concentrate on what is important for you: performance, functionalities,
At work I am a big fan of Visual SlickEdit. It builds complete tags of all the functions, variables, classes etc into tags. Allows me to find all callers of a particular function, definitions, references etc. In Linux it will work with gdb to do step through debugging. I believe most of the functionality is available in emacs, with its ctags. Though most developers in our company use Microsoft IDE, I build all my sln files using slickedit and edit using slickedit. It has good integration with version management tools too.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I find it useful to follow the flow of control using a debugger. For C/C++, I like GDB in "TUI" mode because I can see the code easily.
If the program is multi-threaded, it's worthwhile to read up on how to handle threads in GDB. Understanding what's going on in a multi-threaded program can be especially difficult. Having the ability to switch between threads, to get backtraces (so you can see how that thread got where it is currently) and inspect its variables can be very informative.
less
2) Just because the code is awful doesn't mean it has no value -- No matter how bad it is and how difficult it is to read, if it works at all it has probably got years (maybe even decades) of bug fixes and feature requests. Until you have a handle on it, any little change could cause a catastrophic cascade of side-effects.
3) No, we don't need to rewrite it. See 2. A working program now is worth more than all the pie in the sky you can promise a year from now.
4) It takes 6 months to have a reasonably good grasp of any moderately complex in-house application. It could be a year before you get to the point where someone can describe a problem and you immediately have a good idea of where in the code the problem is occurring and what functions to check.
Maintenance programming is as much about detective work as anything else. The only clues you have about the previous programmer are his source files. Once you've read them for a while you can start to tell what he was thinking, when he was confused, when he was in a hurry. Most of the atrocious in-house applications have changed hands several times and each programmer adds their own layer of crap. You can redesign these applications a chunk at a time until nothing remains of the original code if it's really bad, but it's best to save really ambitious projects until you understand the code better. I heartily encourage the wholesale replacement of "system()" calls with better code immediately, though. In several languages I've run across these calls to remove files, when they could have simply called a language library call (Typically "unlink".) If the original programmer used system("rm...") you can pretty much assume that they were a bad programmer and you're in for a lot of "fun" maintaining their code.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Read this: http://stackoverflow.com/questions/3586073/reading-others-code
Also: http://www.codinghorror.com/blog/2012/04/learn-to-read-the-source-luke.html
Chances are that no-one else has and doing so will help you understand it as well as producing some useful output to get the project going again.
You can use static and dynamic analysis to gain knowledge on the structure of the program. Static analysis can be done with tools like Modisco (however, Modisco is not for C++ I guess). For the dynamic analysis, you need to add a monitoring feature to the code. This can be done with AspectC++. Instrument entry and exits of public methods (if it is OO code) or structural blocks detected by the static analysis.
However, if the program is rather small, then you can do the analysis also by hand.
Since this is an OSS project, can you suggest any tools similar to Understand that don't cost $995?
The only thing I could find was source navigator NG, but I have zero experience with it.
echo "i am here";
or print or console.log or printf or ...
The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
Reading other people's code is a punishment that one must master if you hope to grow as a programmer.
So what is the reward for reading comments which are unnecessarily set in monospace type?
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
In fairness I don't think I've ever read *any* STL implementation that wasn't hideous. Seems like non-trivial template programming internals are sort of like VB - without exhaustive self-discipline the result defaults to hideous abominations you pray you never have to look at again. And it doesn't help debugging that compilers all seem to be optimized to deliver maximally-obfuscated error messages if they encounter a problem related to templated code.
Don't get me wrong, I love templates and even engage in metaprogramming from time to time, but there must be a better way.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
You can use doxygen to create a dependency graph and visualize it using CCVisu.
http://www.stack.nl/~dimitri/doxygen/
http://ccvisu.sosy-lab.org/
http://ccvisu.sosy-lab.org/manual/main.html#sec:input-dox
Learn to use a debugger with a nice interface for exploring call stacks, data structures, threads, etc. Then, get the mystery code to do something simple. and put a breakpoint at the point its about to complete the task. (Try grepping for a string you see in the output.) Work up the call stack putting a breakpoint at the start of each procedure that is being called. Now repeat the task and look at each procedure as it is invoked.
This will be a slow and painful process. Make sure you don't have anything better to do with your life before embarking on it.
Good luck.
Namgge
i just start fixing problems or adding features and learn the code on the go... you dont need to read all the code from the start...
If a function is to arcane to be understood in 10 Minutes, start breaking it down into smaller *documented* functions while maintaining the same test results. Good chance that this will also help the project directly
Two steps:
1. Make an act of thanksgiving that you're not dealing with code written by someone trained in FORTRAN, where all variable names are two characters long and all function names are six characters long. "You can write FORTRAN in any language." If you do happen to be dealing with FORTRANesque code, give up now.
2. Become familiar with the idioms of every programming language, of every programming paradigm (structured, object-oriented, functional, event-driven, dataflow, etc.), and of every programmer educational background (high school/self-taught, CS degree, startup pressure cooker, etc.). If you don't have time to do that prior to starting on the current project, then let your attempt to reverse engineer the current project serve as a building block for the next time.
The first thing is that you need to know the language you are working in. So if you are coming from a C background, it's similar but you should pick up a book on C++ (or whatever language you happen to be working in). Secondly it takes practice. Every day I am working with other people's code to fix problems. Once I find the problem area I have to sit and digest what's going on line by line, and usually add comments, where were not there to begin with. Lastly learning how to effectively read other people's code is one THE BEST skills you can have as a developer. Anyone can read their own code but to work as a team you need to be able to read other people's code and not get turned off by it. Small rant, most people who can't read other people's code seem to think that no one else knows what they are doing and that their code is sacred and they are the best coders ever, which is rarely the case and usually the opposite.
Write unit tests for the code and develop a regression test suite for it. This in itself can help you understand what's going on, but then also when you later start re-factoring or changing the code, you can be sure you're not breaking other parts of the code in subtle ways (or at least if you do break it, you'll know sooner than later). This will also help anyone else who might contribute to the project. Your mileage will vary depending on the size of the code base of course.
It's just how your brain works. It's a lot easier to examine a piece of mechanical machinery when it's in motion. You notice more. Do the same with the code. Run it. Run components independently. Put plenty of log statements or if it's feasible, watch under a debugger. But don't try to look at stale code just sitting there. You'll notice more as it moves.
Any guest worker system is indistinguishable from indentured servitude.
That's assuming the original programmers followed the MVC pattern. Very often they didn't. That's when you start reading tea leaves.
This calls to mind the story of the king who was in a class for learning algebra, and after realizing it was hard, he took the teacher aside and said 'I'm the king - show me the easier route'.
There's isn't one. Sit down and read it. Then re-read it. Then think. Then read it again. Think again. Repeat.
"The greatest lesson in life is to know that even fools are right sometimes" - Winston Churchill
I would start by finding a chair. Seriously. The only way to read code is by reading code. The more code you read the more you realize the futility of any method other than -- sitting in a chair and reading the code.
print statements are the greatest debugging tool ever invented.
it will work on any piece of code, any language, any type of situation. you can trace anything.
Well, if it's not documented, write the documentation. Skip the automated crap. You need to do this yourself in order to ensure that you understand it.
That's why people take notes in classes, etc.
I'd start with a class diagram, some sort of high-level flowchart, and grouping modules into layers.
You will never fully understand the code just by reading it. My approach is to ignore all of it until something needs to be changed. When you need to change something, add a feature, etc... find where in the code the functionality is and tweak it a bit. See what happens. Tweak it some more. See what breaks. You will start to get a deep understanding of a focused section of the code and not have to worry (yet) about other unrelated areas. Start with small changes first. Larger changes may require a deeper understanding of the architecture and how pieces interact. This will come in time. After a few iterations of this and you will eventually become intimately familiar with all the pieces of the code.
Ascalante: Your bride is over 3,000 years old.
Kull: She told me she was 19!
Spend an afternoon or three skimming around the code pulling threads and following them. Jump around kind of randomly, if things start making sense in one module, go somewhere else for awhile. Take frequent breaks. Take notes about what you think things are doing, or perhaps ideas about how to improve the code - but don't start improving things now, you just want to figure out how much you're in for.
After awhile doing that, you should have a few ideas about good accomplishable problems, now pick one and go deep for a limited time (hour, afternoon, week, depends on the scope of the code and your commitment to it). Again, keep notes, and then throw all your work away (or check it in somewhere - but don't focus on shipping, that detracts from learning). Again, go somewhere else in the code, fix something, take notes, throw it away. Alternate back and forth between research and application, trying not to bias towards one or the other (which can be a form of procrastination).
Now throw away all your notes. They were written by someone who had no idea what was going on. By now you're pretty sure you know what's going on (you don't) and how to make things better (you have no idea), so circle around for another pass. Stop when you start finding that your notes seem to be recognizing actual immediately-actionable problems and solutions, rather than hypothesis and speculation. Or just stop because you're now so busy fixing things that you don't have time for exploration.
Using Doxygen with dot (part of GraphViz) you can generate HTML with UML like diagrams that will help you visualize the design.
Also, it you don't use an modern IDE, Doxygen can build the code with HTML links that you can use to jump to the method call or to see a variable definition.z
BTW: I'm assuming you are looking at code in a language supported by Doxygen.
Find the person who wrote the code. Make sure that educating you or your colleagues is part of their paid responsibilities, and make sure that you respect their work when reviewing it: this helps them share the ir work and take it well if you need to revise things. My colleagues and I often bring new features or help stabilize old projects, and a working relationship with the original author is invaluable.
And ff the author says "just read the code, I don't do documentation because documentation can lie", or if they say "don't bother checking the data for correctness, just don't make mistakes", be ready to throw out _everything_ they ever wrote. It may work at the moment, but it's likely to be as broken and unsustainable as their attitudes.
A person is a genius if he knows every nook and cranny of C++. But no one is expected to. Even just classes and objects are a fantastic addition over C, so there is really no reason to shun C++.
Actually Linus wrote a follow-up for that in rc6. :)
And I didn't even need to curse all that much at people. Sure, I talked smack about some of your hamsters, and I declined a couple of pull requests, but let's face it, it was pretty halfhearted. Most of the time things were good.
I always find this greatly depends on the quality of the code, which varies greatly from well written and documented and just involves some boring reading and tracing through of execution paths too the absolutely appalling where you can sometimes only understand why the fuck they did something when you change the code and see how it breaks. The former I usually rely on a quiet room and lots of caffeine, the later requires swearing, loads of code changes to trace what is actually happening and cursing the original author, their relations and the goat you are sure they must have been molesting while writing the code.
Depends on whether you understand what the code does. If you understand what the code does from the user perspective, then find the code which does the most critical and interesting bits and find out how it works. If you have little clue about what the code does then its trickier. I think just browse the code and be guided by your own curiousity. Examine any documentation that might exist, or any user interface that might be available to find clues to what is important.
Look at what external libraries it uses. How it interfaces to the outside world, whether it be a user interface, network connections, files used, etc etc.
A person is a genius if he knows every nook and cranny of C++. But no one is expected to. Even just classes and objects are a fantastic addition over C, so there is really no reason to shun C++.
The problem is that taking C and just adding classes and objects would have been nice, but the changes in C++ go so far beyond that they can reach a perl-like level of syntax confusion.
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
There is really no point trying to memorize what all the code does.
Just identify what functionality you want to add and start coding away.
As you go you'll get familiar with the code base automatically.
If there's some internals documentation (like there is for GCC and binutils) read that.
Load up the UI/interface, find a very specific piece of functionality, and then find the UI hooks in the code and work backwards from there. Repeat for another function that involves something completely different, i.e. access control/data/rendering/logic.
Once you are comfortable with that process, you are ready to absorb from the top-down.
spitting hex into a buffer, then reading that buffer later = primitive form of print statement
If possible, I usually like to start by getting an overall understanding of the various data structures used to implement the program. Sometimes this can be very helpful, particularly if the code was written by someone who designed the code rather than hacked it together. Ever since I took my first data structures class I have maintained that if you can understand the structures, the algorithms become almost self-evident. However, it's not always tremendously helpful, particularly if the implementor just accreted functionality into the program, or who views a big struct or two full of every little thing the program needs as good engineering. It also tends to break rather badly if the code has had a succession of short-to-medium term caretakers rather than one person maintaining it all along. That seems to be a fairly typical situation in commercial code.
If following the data structures isn't helpful, then I tend to follow a top-down linear approach. That is I start with main(), get an overall sense for the flow of the program at that level, then start at the beginning again and work my way section by section or line by line through the code (or at least the parts of it that I think I care about). In other words at I first do a high-level read-through until I get the basic idea of what it's trying to do, and then fill in the details of how it does it. I repeat this at each level as I dig deeper into the code. It sort of ends up being a breadth-first summary scan followed by a depth-first extraction of details.
Others have suggested commenting or reformatting code as you go along. My opinion is that if you do so be fully prepared to throw all that work away unless you know you're about to become the head maintainer of the code in question. Original authors don't seem to be able to see how bad their code is because it often isn't bad to them -- their code reflects their mental processes and expertise. It's just not worth the struggle, usually, to even reformat the code underneath them. There's lots and lots of terribly ugly code out there in the world, and almost every time I start looking at something new I call down curses and damnation on the authors. However in the end I just learn to live with it. Unless it's so terribly abhorrent as to actually be broken because of how it's written, I play the code chameleon and modify code following the same nature of ugly as the pre-existing code.
Cyrano de Maniac
Step 1. Make sure it compiles and you compile with the same options used in the current version deployed on production
Step 2. Make sure you understand which version was deployed on the production system. Were any fixes applied to the source but not deployed.
Step 3. Understand the scope of the component. The scope of a system is its user interface and its database. Input -> (Program) -> Output. The program can only do its job based on inputs (Screen, Config Files, Database, Integration Interfaces) and its output (Screen, Database, Integration Interfaces).
Step 4. Understand the list of open issues/defects. Separate between architecture problems and functional problems. Understand potential "flaws" and where the difficult bits are.
Step 5. Do a quick 1 hour skim over every module, file, and every function. Try and understand how it is logically organized. See if there are unit tests, any tests, how its built, how its layered, etc. These things you will not understand from the various functional analysis tools. (They only work if he's been a stickler for functional coding in conjunction with OO coding). Step 4 helped you prioritize your "attention". Figure out where the boilerplate code is (generated web services, UI interfaces, DAO, etc), which are the functional utilities classes, and which are the database/interface objects. The rest should be the problematic business logic/processing code.
There are alot more but the above should get you onto the right track.
This was posted on reddit a few days ago (pdf warning yadi yada). I think that the "From Legacy Code to Clean Code" section may be relevant to you.
The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
Start FIRST with what YOU want to do and WHY it is important to do it your way. Without this motivation you're wasting your time.
If you don't know anything about the architecture of the system then sketch your own over a cup of coffee to find out what are likely to be the key components.
Now you have a goal you can see what parts of the existing system are applicable, missing etc. Your basic knowledge of the inside cogs, wheels and not forgetting irrelevant bells and whistles will be a great help in focussing on elements, themes or modules. (For example the original might be full of cruft concerning what you regard as a dead-end but the original developers considered a bonus feature.) With the knowledge gained from the original system you may be able to look upon it as a prototype and build a much simpler system that isn't full of serial adaptations.
If you have a 'porting' job then there are probably tools to at least highlight places to deal with.
Fire up your debugger and start stepping!
0x or or snor perron?!
Whats the point in reading foreign code if you don't plan to work on it?
If you want to work on it, isolate the area your code will touch the old code and work on that with a debugger.
Reading huge C/C++ code bases is rather pointless ...
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
The problem is that taking C and just adding classes and objects would have been nice, but the changes in C++ go so far beyond that they can reach a perl-like level of syntax confusion.
You still have that option. Just take the basic OOP parts and don't use the crazy stuff at all. :)
Remember writing BASIC in 3rd and 4th grade, where you had to flowchart everything out first? Go do that.
I want to delete my account but Slashdot doesn't allow it.
The problem is that taking C and just adding classes and objects would have been nice, but the changes in C++ go so far beyond that they can reach a perl-like level of syntax confusion.
You still have that option. Just take the basic OOP parts and don't use the crazy stuff at all. :)
That's a good idea when you're the one writing it. It's not always in your control when you're maintaining.
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Pseudo code it.
Read the major blocks of the application in what ever editor makes you happy, with a text editor next to it ( preferably one that does arbitrary text folding ) and just psuedo code the hell out of it. The more of this you do, the more you'll be able to look back at your own pseudocode for a definition of some obscure thing, and the more a pattern will appear as you notice yourself typing the same things over and over again.
Also works as sl/hack documentation when you're done.
Old truckers never die, they just get a new peterbilt
It's actually "How to Start Reading Others' Code?"
You put the apostrophe after the s, otherwise, it's only one other that the code belongs to, which is just strange.
- Zav - Imagine a Beowulf cluster of insensitive clods...
Knowing the Requirments gives you the ground work for understanding what the code is doing.
Try to get a hold of the requirements document, specifications, and detail design. Once you have the requirments you should have pretty good idea of what should be in the code. Header comments often do not repeat the contents of the specs.
Since this is an OSS project, can you suggest any tools similar to Understand that don't cost $995?
Eclipse CDT has a very powerful index (when it works) with which to search who calls what, or who depends/inherits from who.
It is still a crapshot when the code is atrocious (or complex/large enough that even good coding efforts are not enough.) Slowly but surely identify what look like important functions. Screenshot the call/type graphs in Eclipse and put them on a document.
Sometimes I (grep|awk|find|sed|tr)+ the crap out of source files looking for types and functions/methods, massaging them into submission until they can look like a CSV file. Then I load them into Excel or Access.
Sometimes, not always, but sometimes you can glimpse a lot of knowledge when you look at code structures (functions and types) in a tabular format (in particular when dealing with CORBA IDL elements and their C/C++ implementations for instance.) Another advantage is that sometimes (again, sometimes) you can take those elements, and massage them into a DOT file from where to generate a graph of sorts.
Similarly, I've generated DOT-based graphs of file dependencies, object dependencies, etc. You can run nm or objdump to generate a list of "things" included in the obj files, and generate a sort of component dependency graph.
But the cheapest way to go about it, if you are using the GNU compiler suite, is to use gprof. If you have a set of test cases that can exercise a substantial portion of your code, you might be able to get a partial call graph. The call graph might be dependent on the test scenario, but it is something that can get you going in the right direction.
Sometimes a code coverage tool like lcov, running in tandem with gprof, can help as well. It might give you an indication of dead code (if your tests are comprehensive enough) or code that still needs inspection (if your tests are not comprehensive.)
It is all manual and thus prone to error. That is the price of not using a good code navigation tool (which unfortunately the ones worth a spit are commercial-based.)
But with some good elbow grease, spit and diligence, you can go a long way by clobbering something together using existing tools (gprof, lcov, nm, objdump, grep, awk, sed, tr, perl, excel, etc.)
The only thing I could find was source navigator NG, but I have zero experience with it.
I find that the best way to read and understand someone else's code is to comment it.
I've always been a little ADD and impatient with having to do things systematically, unfortunately I found this is the only approach that works for me. If I don't do this, I end up staring at the screen for long periods of time and not getting anywhere. So, I wrote up a worksheet to help myself in these situations, here are the steps I identified as helpful to deal with massive levels of complexity in unfamiliar code:
1. Establish a clear goal and sub-goals.
2. Use the goals to determine the scope of your reading.
3. Allocate quite a bit of time in large chunks.
4. Identify key layers of abstraction.
5. Enumerate classes (functions, namespaces) of interest.
6. Systematically, read through each class superficially.
7. Pick 8 classes to focus on.
8. Do a deep dive.
9. List/sketch inputs and outputs in terms of function names, types referenced.
10. Look at relevant tests for usage as needed.
11. Check off each class once looked through.
12. Measure the complexity of a component by how many checks are required for full understanding.
13. Iterate until goals achieved.
There is a good side to reading another programmers code, which is in seeing other methods and approaches to solutions. For any given task, there are many ways in which it can effectively be programmed for. Grant 90% of solutions done are usually inefficient, unstable, bloated, or just outright wrong, but out of that other 10% you can find some interesting things, even if it isn't your preference or end choice. The main thing is to be open-minded. A programmer that believes their way is the only good way (which is most programmers) is a programmer that expands their knowledge and capabilities very slowly.
As for being able to read it, I would strongly suggest customizing a plug-in to your favorite IDE and have it reformat the code. Every programmer codes with their own style, and while you wish everyone used correct indenting and such, the reality is far from that. If you want to save yourself a lot of headache and time, just auto-reformat each page as you go through it. This not only will make it so you can save the source code with good formatting, but will also make it much easier for you to read and review. At least that's my two bits, which in today's economy counts for next to nothing. 8-) I've been in your position too many times, and any way you look at it, you've got a bunch of scanning through files ahead of you. Good luck.
You should use a script to strip out the comments. If any are actually present, they are almost certainly misleading.
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
Unfortunately, if it's someone else's code that got dumped in your lap, you're pretty much stuck figuring out what crazy stuff the original programmer(s) managed to do with C++.
Get a good global search utility. Search for every use of the symbol names that you are working with, to see what their scope is and how they are used. You can see how the other code uses them, steal snippets of working code you need, and tell what might "break" your use of them. If you have seen every use of a symbol, in the whole code set, then you can feel more confident that what you are doing will not break things.
Don't try to learn everything at once, your brain can't hold it all right away. Study about the modules and symbols that you must work on. Start with finding out how much you actually -do- have to work on. Then widen your scope as you find more "connections" to other things.
Everything is more complicated than you think, and more complicated than your boss thinks. Be cautious, go carefully, and don't go "running full speed off a cliff". This will actually end up being much faster to a working version.
Frustration is normal, when learning something new. It does not necessarily indicate that the thing you are studying is screwed up (although it might be). Work through the frustration before making changes. (Except, it might be good to make temporary "test" changes to help in the learning.) It is painful, but you will learn and grow from it.
Strongly resist the temptation to re-write stuff that is new to you, no matter how bad it looks. There will usually be good reasons that it is that way, and you will make a disaster. Make good backups of everything before each change and at least every day. There will be places where you must throw away what you have done and go back.
On the other hand, don't be afraid to make global changes to the code set if it is really needed. Just back up a working copy that you can go back to, and be careful.
Do things in phases, even if the "customer" only wants a final version. Make a reasonable set of changes than get it working and debugged. Then do it again for the next changes. Debugging as you go is actually much faster in the long run. Learn to use a Debugger, it is like turning on a light in a dark room.
Find quiet time for several hours to work on it, so that you can get into a "state of flow" mentally. This is much more effective than normal work with interruptions.
HTH.