Ask Slashdot: How To Start Reading Other's Code?
BorgeStrand writes "I'm reviving an open source project and need to read up on a lot of existing code written by others. What are your tricks for quickly getting to grips with code written by others? The project is written in C++ using several APIs which are unknown to me. I know embedded C pretty well, so both the syntax, the APIs and the general functionality are things I wish to explore before I can contribute to the project."
If there's a lot of documentation, interpret it like your favorite religious text. Try to hit up some of the old developers from the VCS. Also, I'd like to help :)
Knowing the data structures gives you the ground work for understanding what the code is doing. The data structures are a more direct description of the design decisions.
What are your tricks for quickly getting to grips with code written by others?
For me, it comes down to a lot of mountain dew, techno music, and hours of guru meditation. As you dissect each function, sketch out its relationship to other major functions. I take a two pass approach .. first, just look at the function call outs and the return values and make a rough sketch of the 'scaffolding' of a program. On the second pass, any function that you can't see the obvious application of, or appears obfusciated or complicated, dissect into functional units and sketch out what it does in your notes. I do this by actually physically drawing the relationships using something called a mind map.
Until you get used to it, actually writing it down, even if it's just a bunch of messy arrows to blobs of circled text... it will help job your memory and help things sink in until you have the necessary 'ah ha!' moment.
YMMV.
#fuckbeta #iamslashdot #dicemustdie
Find a good editor or IDE that allows you to quickly navigate the code.
Figure out how to build it
Figure out how to test it
Read the API docs. Understand the objects and how to interact with their interfaces.
YMMV, depends on how smart you are I guess.
Look straight at the code for a few hours without moving an inch. After that its details should be printed into your brain.
If possible, I would try writing unit tests for the existing code. This tests your understanding of what you are reading and will come in handy later if you change the code. If unit tests already exist then I suggest that you read them since they will tell you the intention of each function.
It depends on what you need to do. I usually start with main and trace through it slowly but surely. Usually it takes me a week or two to fully comprehend the pieces of the code to the point where I feel comfortable with making big changes.
Another approach is to find some small bugs that need fixing and fix them. Something that will force you to have to find what function the bug is in and then step through the code with a debugger.
Good luck!
Lots of practice reading and writing C++ code. Reading decent books like the Addison-Wesley C++ series edited by B. Kernighan helps, although the snippets of code presented by the authors aren't representative of a real code base.
If there's any documentation, read that first. Usually there isn't much, or not nearly enough, and the code has been written by multiple programmers at different times.
Google for tutorials and sample code on the relevant external APIs.
I gave up !!!!
Same think can be said about my attempt to read EXT2 source code.......
Everything else will seem simple after that...
Any insufficiently advanced magic is indistinguishable from technology.
They will be doing horrible, horrible things from your point of view. Manage your mood, to be able to stick to the goal.
1) Start by understanding completely the target language, in this case C++. 2) Get a IDE that is able to provide full support for finding references and definitions in C++. For now only Netbeans CND is able to proper do that for Linux-based OS. Alternative for WindowsOS use Vision Studio. 3) Then you just have to start reading the code and build the mental map of the project.
Read as much code of as many different styles as you can. Eventually, you will hopefully start just getting a feel for programming much like music, where different code has different styles. You'll see manifestations of different patterns and start to gain a deeper understanding of it (it can have an almost zen-like quality to it)...why certain patterns get used, why certain developers use patterns that don't make sense to you, what kind of developers the authors are. It took me years before I started to get a feel for reading other people's code, but I have an idea as to how good a programmer is and even what kind of everyday personality they have from reading their code. It really is like music. And one more thing: don't count on comments for guiding you through this; it's rare to come across well-commented code, especially in a professional environment (ironically enough).
Even without Doxygen's specific format for comments, you can use it to graph object relationships, call-trees, etc.
You can generate docs limited to a few files or classes if you just want to focus on them.
www.doxygen.org
First, figure out how the code gets loaded and runs. Find the equivalent of the 'main' function. Then start tracing it, seeing what functions get called, how things are loaded, etc. What really helps here is an editor you can CTRL+click on a function on to go to its definition. When you hit a function that doesn't call any other unknown functions, then you can start understanding what it does without having to step into it. These are the basic functionality units. Then when you know enough of those, you can start going a level up, etc. Eventually a picture forms in your mind of just how things work. You can optionally skip over functions for preference of looking at them later if it seems pretty clear what they do based on the name & how they're called, but you might find important stuff in there later. This is how I go about it, anyway. It can be very frustrating and very confusing at first, but eventually the picture starts making sense, then things click in a most satisfying manner. That being said, the above is also the reason I can dislike complicated frameworks. There's so much indirection that it can take quite a while indeed until you hit something concrete. The mark of a good framework is, either it doesn't do that, or it does but soon enough you figure out its parts and then you can treat it intuitively.
I find that going through some key functions (assuming you can find them) and reformatting them to your own liking can be helpful, commenting code along the way. Then if you want to get more aggressive, start cleaning up some code in minor ways that still stay true to the function's meaning. After you've done a bit of that, you should probably have at least a vague idea what's going on.
People who say "sheeple" have about as much sophistication as an AOL user, and in fact are probably actually AOL users.
I disagree with reading too much code.
Run the code to see what it does. Add some printfs to validate the understanding of what the code is doing.
It won't help you understand the code but you'll stop worrying about it so much.
I've found the best way to learn a code base is to start editing it. There are almost always formatting details and names that don't feel right. Start changing those and see what happens. The process will force you to understand the logic and help you really understand the code. Even if you throw all your formatting changes away, you'll definitely learn something.
Hopefully, the project has tests as well that will let you see if your formatting changes break things, in which case you've begun to understand the relationship between different code blocks.
C++ is a very very different language from C, it's way more powerful and so more complex: It will take time (a lot of it) to learn first C++. The C skills are pretty much useful as the first 50pages in the C++ book: you've still 850 more to go from there.
We will feast on what is inside!
First I'd start by learning the library/apis it is using. Write a few test programs in them to solidify the knowledge. Work your way up, layer by layer... which is the way you'd write it in the first place.
I recommend starting by working on the bug list. It gives you something to work on constructively and it also makes you look through all the code to track the problem.
As you are doing this, start generating you own documentation. If the code doesn't use DOXYGEN, add that. Reformat and add comments. Write external documentation. When you are documenting, think of what you wish the previous coders had done for you, and then do that.
This is the way I write code from the beginning, and it leads to better code. If I can explain what is going on then I know I understand it. If I can't explain it, then there is a pretty good chance that there are bugs. It's good practice whether you are taking over someone else's code or starting from scratch.
Why is Snark Required?
The trouble with university education, is that most people who teach there are computer scientists, not software engineers with years of experience in the trenches.
If this were actually the case, there would be a recognition that reading code is far harder than writing it. And far more emphasis would be on coming to grips, understanding, and working on large code bases. There'd be more stuff on things like unit testing, breaking dependencies, troubleshooting, and refactoring at least.
Find out what drugs the original coder was using when writing, and take the same.
Find a function. Refactor it until you grok it. Discard the results.
Keep in mind that it will be VERY tempting to commit your changes, but you must throw away the work and chalk it up as a learning experience if you ever want to be taken seriously by the others who work on the project. Junior developers (and even some senior developers) often think they're doing everyone a favor by doing drive-by refactors, but they're not; they're just slowing down the entire team and coming across as that a**hole who keeps f***ing up the diffs and destroying the useful output of tools like git blame.
If you found any bugs in the previous step, make a patch with the absolute minimal change to fix each individual bug. IMPORTANT: Before committing the patch, first be sure that you can reproduce it in the old code, and that the test case is fixed by your new code.
Repeat the process until you understand the entire system.
With any luck, you will finish with a solid understanding of how the code actually works, and you will most likely also fix a few dozen bugs (if you didn't find at least one bug per kLOC, then "you're doing it wrong" or the code was written by an inspired genius with OCD). At that point, you will be the team's expert on how things work, and you will be in a position where you can start proposing simple refactorings that will improve the code quality.
You mentioned you have embedded C experience and the code of interest is written in C++. You didn't mention if you had any C++ or other object-oriented programming experience. I assume the C++ code uses the OO features of C++ that distinguish it from C -- but this assumption is not necessarily true.
So, if you lack OO experience and the code is truly OO C++ code, you might want to do a little reading up on the basics of OOP in order to spend less time spinning your wheels.
I am not a crackpot.
Here is how I work on legacy code:
1) I don't look at the whole picture because there are too much details, so I prefer to attack little by little.
2) I quickly check what I can rewrite in order to optimize the code. If I have no idea, I run a profiler, and take a look at the routines that take the most time.
3) once I understood or rewrote the most consuming parts (sometimes it's heavily optimized, but most of the time, I can make a real improvement), I decide what most important functionality I would like to add, and I just focus on that.
4) if I really need to have robust code, I write tests for the routines before optimizing them, so that I can validate if there are regressions
5) whenever possible, I use "assert" and put some bound-checking tests, in order to validate the ranges of certain values or conditions.
The important thing is to start by taking ownership of a small part of the code, then a bigger part, etc... ... ?
Take one slice at a time, not the whole pie.
And one last point: knowing every little detail is useless, concentrate on what is important for you: performance, functionalities,
At work I am a big fan of Visual SlickEdit. It builds complete tags of all the functions, variables, classes etc into tags. Allows me to find all callers of a particular function, definitions, references etc. In Linux it will work with gdb to do step through debugging. I believe most of the functionality is available in emacs, with its ctags. Though most developers in our company use Microsoft IDE, I build all my sln files using slickedit and edit using slickedit. It has good integration with version management tools too.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I find it useful to follow the flow of control using a debugger. For C/C++, I like GDB in "TUI" mode because I can see the code easily.
If the program is multi-threaded, it's worthwhile to read up on how to handle threads in GDB. Understanding what's going on in a multi-threaded program can be especially difficult. Having the ability to switch between threads, to get backtraces (so you can see how that thread got where it is currently) and inspect its variables can be very informative.
Especially at the so-called engineers' deceased pet hampsters.
(going anon for this)
less
2) Just because the code is awful doesn't mean it has no value -- No matter how bad it is and how difficult it is to read, if it works at all it has probably got years (maybe even decades) of bug fixes and feature requests. Until you have a handle on it, any little change could cause a catastrophic cascade of side-effects.
3) No, we don't need to rewrite it. See 2. A working program now is worth more than all the pie in the sky you can promise a year from now.
4) It takes 6 months to have a reasonably good grasp of any moderately complex in-house application. It could be a year before you get to the point where someone can describe a problem and you immediately have a good idea of where in the code the problem is occurring and what functions to check.
Maintenance programming is as much about detective work as anything else. The only clues you have about the previous programmer are his source files. Once you've read them for a while you can start to tell what he was thinking, when he was confused, when he was in a hurry. Most of the atrocious in-house applications have changed hands several times and each programmer adds their own layer of crap. You can redesign these applications a chunk at a time until nothing remains of the original code if it's really bad, but it's best to save really ambitious projects until you understand the code better. I heartily encourage the wholesale replacement of "system()" calls with better code immediately, though. In several languages I've run across these calls to remove files, when they could have simply called a language library call (Typically "unlink".) If the original programmer used system("rm...") you can pretty much assume that they were a bad programmer and you're in for a lot of "fun" maintaining their code.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Read this: http://stackoverflow.com/questions/3586073/reading-others-code
Also: http://www.codinghorror.com/blog/2012/04/learn-to-read-the-source-luke.html
Chances are that no-one else has and doing so will help you understand it as well as producing some useful output to get the project going again.
Reading other people's code is a punishment that one must master if you hope to grow as a programmer. My favorite approach these days is to use callgrind and kcachegrind, walking up and down the call stack for each stimulus I can muster. I often build a custom client to send malformed requests for these tests, it usually becomes part of my unit/regression tests. Then I make note of the most prominent function names and data structures. I construct an outline as if I were writing a book. GDB is also a fantastic tool for understanding software. You can learn a lot about code merely by setting a break point in malloc or read. Don't be afraid to use 'set var' to explore an interesting test case. strace, particularly with -c or -ff, can give you a quick idea of what a program does. Eventually I start to change it. First I use indent or some IDE to fix the whitespace, then I start to refactor. Eventually the code begins to look more like something I wrote than what I inherited. That's how I know when it's time to hand it off to someone else ;)
You can use static and dynamic analysis to gain knowledge on the structure of the program. Static analysis can be done with tools like Modisco (however, Modisco is not for C++ I guess). For the dynamic analysis, you need to add a monitoring feature to the code. This can be done with AspectC++. Instrument entry and exits of public methods (if it is OO code) or structural blocks detected by the static analysis.
However, if the program is rather small, then you can do the analysis also by hand.
Start a text file. what you want to do is this: basically putting code into English (or your primary language)
function A
calls function B
function B
cleans and persists the data
returns the clean version
puts the clean data from b() into htm
Reading AND writing code puts it into your head in two different places, which is strangely useful.
Since this is an OSS project, can you suggest any tools similar to Understand that don't cost $995?
The only thing I could find was source navigator NG, but I have zero experience with it.
echo "i am here";
or print or console.log or printf or ...
The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
Abandoned pile of C++ code ... well, if you've got a self-defeating personality disorder, you may try to make it work. Or even do whatever you want to be done.
Normal response would be to forget it and start from a clean slate.
Forgetting about C++ as well would do much towards keeping you sane.
You can use doxygen to create a dependency graph and visualize it using CCVisu.
http://www.stack.nl/~dimitri/doxygen/
http://ccvisu.sosy-lab.org/
http://ccvisu.sosy-lab.org/manual/main.html#sec:input-dox
Learn to use a debugger with a nice interface for exploring call stacks, data structures, threads, etc. Then, get the mystery code to do something simple. and put a breakpoint at the point its about to complete the task. (Try grepping for a string you see in the output.) Work up the call stack putting a breakpoint at the start of each procedure that is being called. Now repeat the task and look at each procedure as it is invoked.
This will be a slow and painful process. Make sure you don't have anything better to do with your life before embarking on it.
Good luck.
Namgge
i just start fixing problems or adding features and learn the code on the go... you dont need to read all the code from the start...
If a function is to arcane to be understood in 10 Minutes, start breaking it down into smaller *documented* functions while maintaining the same test results. Good chance that this will also help the project directly
Add comments to the parts that aren't clear as you're reading through the code.
Two steps:
1. Make an act of thanksgiving that you're not dealing with code written by someone trained in FORTRAN, where all variable names are two characters long and all function names are six characters long. "You can write FORTRAN in any language." If you do happen to be dealing with FORTRANesque code, give up now.
2. Become familiar with the idioms of every programming language, of every programming paradigm (structured, object-oriented, functional, event-driven, dataflow, etc.), and of every programmer educational background (high school/self-taught, CS degree, startup pressure cooker, etc.). If you don't have time to do that prior to starting on the current project, then let your attempt to reverse engineer the current project serve as a building block for the next time.
The first thing is that you need to know the language you are working in. So if you are coming from a C background, it's similar but you should pick up a book on C++ (or whatever language you happen to be working in). Secondly it takes practice. Every day I am working with other people's code to fix problems. Once I find the problem area I have to sit and digest what's going on line by line, and usually add comments, where were not there to begin with. Lastly learning how to effectively read other people's code is one THE BEST skills you can have as a developer. Anyone can read their own code but to work as a team you need to be able to read other people's code and not get turned off by it. Small rant, most people who can't read other people's code seem to think that no one else knows what they are doing and that their code is sacred and they are the best coders ever, which is rarely the case and usually the opposite.
Write unit tests for the code and develop a regression test suite for it. This in itself can help you understand what's going on, but then also when you later start re-factoring or changing the code, you can be sure you're not breaking other parts of the code in subtle ways (or at least if you do break it, you'll know sooner than later). This will also help anyone else who might contribute to the project. Your mileage will vary depending on the size of the code base of course.
It's just how your brain works. It's a lot easier to examine a piece of mechanical machinery when it's in motion. You notice more. Do the same with the code. Run it. Run components independently. Put plenty of log statements or if it's feasible, watch under a debugger. But don't try to look at stale code just sitting there. You'll notice more as it moves.
Any guest worker system is indistinguishable from indentured servitude.
just sayin..
If it's under version control I would go with the commit history to see what they were thinking as the project progressed.
That's assuming the original programmers followed the MVC pattern. Very often they didn't. That's when you start reading tea leaves.
This calls to mind the story of the king who was in a class for learning algebra, and after realizing it was hard, he took the teacher aside and said 'I'm the king - show me the easier route'.
There's isn't one. Sit down and read it. Then re-read it. Then think. Then read it again. Think again. Repeat.
"The greatest lesson in life is to know that even fools are right sometimes" - Winston Churchill
I would start by finding a chair. Seriously. The only way to read code is by reading code. The more code you read the more you realize the futility of any method other than -- sitting in a chair and reading the code.
This free book by Prof. Serge Demeyer gives a good introduction to software re-engineering.
Serge Demeyer, Stéphane Ducasse and Oscar Nierstrasz, "Object-Oriented Reengineering Patterns," Morgan-Kaufman, 2003.
The book is available as free .pdf download the author's page: http://win.ua.ac.be/~sdemey/
http://www.iam.unibe.ch/~scg/OORP/OORP.pdf
print statements are the greatest debugging tool ever invented.
it will work on any piece of code, any language, any type of situation. you can trace anything.
Well, if it's not documented, write the documentation. Skip the automated crap. You need to do this yourself in order to ensure that you understand it.
That's why people take notes in classes, etc.
I'd start with a class diagram, some sort of high-level flowchart, and grouping modules into layers.
If you only have done C but no C++ or Java, there is one important thing you will have to learn: Interfaces between code. Codified as object hierachies and in headers. If you know C you will think: so my interfaces are just what the code does and what information it needs, so from reading the interfaces I know the code, and you could not err more. Due to the overhead of C++ and Java classes the interfaces are some entity of their own. They cannot be simply changed and usually will be there earlier than the code. So you will see interfaces that lack essential information that code will get elsewhere, interfaces passing information not needed. You will see "glue code": some code only written to make code behind one interface talk with code between another interface.You will find modules that interface with nothing else but glue code. You will find so many classes doing nothing (except connecting other classes that could not be rewritten to talk to each other) that you will end up needing some source code browser just to navigate the class hierachies. And no, this is no sign of bad C++ code, that is inherent to the language and if you dive into some bigger C++ or Java project, it will inevitably look like this.
You will never fully understand the code just by reading it. My approach is to ignore all of it until something needs to be changed. When you need to change something, add a feature, etc... find where in the code the functionality is and tweak it a bit. See what happens. Tweak it some more. See what breaks. You will start to get a deep understanding of a focused section of the code and not have to worry (yet) about other unrelated areas. Start with small changes first. Larger changes may require a deeper understanding of the architecture and how pieces interact. This will come in time. After a few iterations of this and you will eventually become intimately familiar with all the pieces of the code.
Ascalante: Your bride is over 3,000 years old.
Kull: She told me she was 19!
Spend an afternoon or three skimming around the code pulling threads and following them. Jump around kind of randomly, if things start making sense in one module, go somewhere else for awhile. Take frequent breaks. Take notes about what you think things are doing, or perhaps ideas about how to improve the code - but don't start improving things now, you just want to figure out how much you're in for.
After awhile doing that, you should have a few ideas about good accomplishable problems, now pick one and go deep for a limited time (hour, afternoon, week, depends on the scope of the code and your commitment to it). Again, keep notes, and then throw all your work away (or check it in somewhere - but don't focus on shipping, that detracts from learning). Again, go somewhere else in the code, fix something, take notes, throw it away. Alternate back and forth between research and application, trying not to bias towards one or the other (which can be a form of procrastination).
Now throw away all your notes. They were written by someone who had no idea what was going on. By now you're pretty sure you know what's going on (you don't) and how to make things better (you have no idea), so circle around for another pass. Stop when you start finding that your notes seem to be recognizing actual immediately-actionable problems and solutions, rather than hypothesis and speculation. Or just stop because you're now so busy fixing things that you don't have time for exploration.
Using Doxygen with dot (part of GraphViz) you can generate HTML with UML like diagrams that will help you visualize the design.
Also, it you don't use an modern IDE, Doxygen can build the code with HTML links that you can use to jump to the method call or to see a variable definition.z
BTW: I'm assuming you are looking at code in a language supported by Doxygen.
Find the person who wrote the code. Make sure that educating you or your colleagues is part of their paid responsibilities, and make sure that you respect their work when reviewing it: this helps them share the ir work and take it well if you need to revise things. My colleagues and I often bring new features or help stabilize old projects, and a working relationship with the original author is invaluable.
And ff the author says "just read the code, I don't do documentation because documentation can lie", or if they say "don't bother checking the data for correctness, just don't make mistakes", be ready to throw out _everything_ they ever wrote. It may work at the moment, but it's likely to be as broken and unsustainable as their attitudes.
I always find this greatly depends on the quality of the code, which varies greatly from well written and documented and just involves some boring reading and tracing through of execution paths too the absolutely appalling where you can sometimes only understand why the fuck they did something when you change the code and see how it breaks. The former I usually rely on a quiet room and lots of caffeine, the later requires swearing, loads of code changes to trace what is actually happening and cursing the original author, their relations and the goat you are sure they must have been molesting while writing the code.
Depends on whether you understand what the code does. If you understand what the code does from the user perspective, then find the code which does the most critical and interesting bits and find out how it works. If you have little clue about what the code does then its trickier. I think just browse the code and be guided by your own curiousity. Examine any documentation that might exist, or any user interface that might be available to find clues to what is important.
Look at what external libraries it uses. How it interfaces to the outside world, whether it be a user interface, network connections, files used, etc etc.
well you start reading... and then you continue reading... and then you end... did you understand what you read? Good? you didn't? retry! imbecile
I start by making call graphs by hand. Forcing myself to slow down and write it down with a pencil on a piece of paper challenges me to answer "wait, what did I really read!?" questions. I usually don't have to continue this for very long before I start to identify the structure and then I can stop scribbling. But I need to see a mass of detail to begin up-levelling to structure because the structure I build up in my head has to be grounded in real code paths. If I find myself glassing over and just scanning the code, later, I force myself to make some more pencil and paper call graphs until I'm in touch with reality again.
Put a breakpoint in your main method and trace through. Find event handlers and breakpoint those as well.
There is really no point trying to memorize what all the code does.
Just identify what functionality you want to add and start coding away.
As you go you'll get familiar with the code base automatically.
If there's some internals documentation (like there is for GCC and binutils) read that.
fire up gdb or equivalent and walk through major flows while examining states. We had great success with it in embedded systems when absolutely no documentation exists.
Load up the UI/interface, find a very specific piece of functionality, and then find the UI hooks in the code and work backwards from there. Repeat for another function that involves something completely different, i.e. access control/data/rendering/logic.
Once you are comfortable with that process, you are ready to absorb from the top-down.
spitting hex into a buffer, then reading that buffer later = primitive form of print statement
After read the rfc you can understand better the struct variables names and the algorithm. You cannot edit correctly a proxy source code if you don't read http || spdy || etc etc
Reading other peoples code comes with experience. Tools may make it easier initially but I tend to find that jumping in is a great place to start. Get the code compiled and start making it your own. Another mentioned refactoring. I like this approach myself.
Good luck.
If possible, I usually like to start by getting an overall understanding of the various data structures used to implement the program. Sometimes this can be very helpful, particularly if the code was written by someone who designed the code rather than hacked it together. Ever since I took my first data structures class I have maintained that if you can understand the structures, the algorithms become almost self-evident. However, it's not always tremendously helpful, particularly if the implementor just accreted functionality into the program, or who views a big struct or two full of every little thing the program needs as good engineering. It also tends to break rather badly if the code has had a succession of short-to-medium term caretakers rather than one person maintaining it all along. That seems to be a fairly typical situation in commercial code.
If following the data structures isn't helpful, then I tend to follow a top-down linear approach. That is I start with main(), get an overall sense for the flow of the program at that level, then start at the beginning again and work my way section by section or line by line through the code (or at least the parts of it that I think I care about). In other words at I first do a high-level read-through until I get the basic idea of what it's trying to do, and then fill in the details of how it does it. I repeat this at each level as I dig deeper into the code. It sort of ends up being a breadth-first summary scan followed by a depth-first extraction of details.
Others have suggested commenting or reformatting code as you go along. My opinion is that if you do so be fully prepared to throw all that work away unless you know you're about to become the head maintainer of the code in question. Original authors don't seem to be able to see how bad their code is because it often isn't bad to them -- their code reflects their mental processes and expertise. It's just not worth the struggle, usually, to even reformat the code underneath them. There's lots and lots of terribly ugly code out there in the world, and almost every time I start looking at something new I call down curses and damnation on the authors. However in the end I just learn to live with it. Unless it's so terribly abhorrent as to actually be broken because of how it's written, I play the code chameleon and modify code following the same nature of ugly as the pre-existing code.
Cyrano de Maniac
There is no point in "reviving" anything. It was abandoned for a reason.
You can learn from it, but never modify it, its not your property.. even if its open sourced.. someones ego will be bruise and you will be at fault.
Forking or branching is an advanced skill.
Just remember you can't change the past, its over and done with.
Run the code through a debugger and study the flow of data. Learn by analyzing component relationships.
Spend two or three hours reading code, a couple of times a week, for a few years.
Step 1. Make sure it compiles and you compile with the same options used in the current version deployed on production
Step 2. Make sure you understand which version was deployed on the production system. Were any fixes applied to the source but not deployed.
Step 3. Understand the scope of the component. The scope of a system is its user interface and its database. Input -> (Program) -> Output. The program can only do its job based on inputs (Screen, Config Files, Database, Integration Interfaces) and its output (Screen, Database, Integration Interfaces).
Step 4. Understand the list of open issues/defects. Separate between architecture problems and functional problems. Understand potential "flaws" and where the difficult bits are.
Step 5. Do a quick 1 hour skim over every module, file, and every function. Try and understand how it is logically organized. See if there are unit tests, any tests, how its built, how its layered, etc. These things you will not understand from the various functional analysis tools. (They only work if he's been a stickler for functional coding in conjunction with OO coding). Step 4 helped you prioritize your "attention". Figure out where the boilerplate code is (generated web services, UI interfaces, DAO, etc), which are the functional utilities classes, and which are the database/interface objects. The rest should be the problematic business logic/processing code.
There are alot more but the above should get you onto the right track.
This was posted on reddit a few days ago (pdf warning yadi yada). I think that the "From Legacy Code to Clean Code" section may be relevant to you.
The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
First change nothing.
Work top-down.
Write down each requirements that the code implements.
Group them.
Understanding what the code is suppose to do, will help you when you get into details later.
Do unit tests.
One that tests each requirement.
Follow the chain of the code for each action.
Then the structure will be reveled.
Start FIRST with what YOU want to do and WHY it is important to do it your way. Without this motivation you're wasting your time.
If you don't know anything about the architecture of the system then sketch your own over a cup of coffee to find out what are likely to be the key components.
Now you have a goal you can see what parts of the existing system are applicable, missing etc. Your basic knowledge of the inside cogs, wheels and not forgetting irrelevant bells and whistles will be a great help in focussing on elements, themes or modules. (For example the original might be full of cruft concerning what you regard as a dead-end but the original developers considered a bonus feature.) With the knowledge gained from the original system you may be able to look upon it as a prototype and build a much simpler system that isn't full of serial adaptations.
If you have a 'porting' job then there are probably tools to at least highlight places to deal with.
Fire up your debugger and start stepping!
0x or or snor perron?!
Whats the point in reading foreign code if you don't plan to work on it?
If you want to work on it, isolate the area your code will touch the old code and work on that with a debugger.
Reading huge C/C++ code bases is rather pointless ...
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Read code from top to bottom, left to right. Understanding it then becomes a trivial exercise that is left to the reader.
When I am trying to figure out what code is doing I create an outline of it. Most of the time I just do it with pen and paper, other times with a simple text editor. Start with the main function and map out what the code is doing at a high overview level. If you come across something that your not sure about you can mark it with a question mark and come back to it later. The point is to translate the code into a simpler form that you can understand and reference. I like to use the numbered outline because it is easy to group logic and create references to other sections. It's kind of like reverse engineering an use case. It is better to focus on what the code is doing / trying to do, instead of the how is it doing it.
Remember writing BASIC in 3rd and 4th grade, where you had to flowchart everything out first? Go do that.
I want to delete my account but Slashdot doesn't allow it.
Pseudo code it.
Read the major blocks of the application in what ever editor makes you happy, with a text editor next to it ( preferably one that does arbitrary text folding ) and just psuedo code the hell out of it. The more of this you do, the more you'll be able to look back at your own pseudocode for a definition of some obscure thing, and the more a pattern will appear as you notice yourself typing the same things over and over again.
Also works as sl/hack documentation when you're done.
Old truckers never die, they just get a new peterbilt
Refactor into self-documenting code. Developers who think they are being clever by being cryptic are morons and assholes. "Oh, you don't know what 'incept' means?"
It's actually "How to Start Reading Others' Code?"
You put the apostrophe after the s, otherwise, it's only one other that the code belongs to, which is just strange.
- Zav - Imagine a Beowulf cluster of insensitive clods...
Knowing the Requirments gives you the ground work for understanding what the code is doing.
Try to get a hold of the requirements document, specifications, and detail design. Once you have the requirments you should have pretty good idea of what should be in the code. Header comments often do not repeat the contents of the specs.
Since this is an OSS project, can you suggest any tools similar to Understand that don't cost $995?
Eclipse CDT has a very powerful index (when it works) with which to search who calls what, or who depends/inherits from who.
It is still a crapshot when the code is atrocious (or complex/large enough that even good coding efforts are not enough.) Slowly but surely identify what look like important functions. Screenshot the call/type graphs in Eclipse and put them on a document.
Sometimes I (grep|awk|find|sed|tr)+ the crap out of source files looking for types and functions/methods, massaging them into submission until they can look like a CSV file. Then I load them into Excel or Access.
Sometimes, not always, but sometimes you can glimpse a lot of knowledge when you look at code structures (functions and types) in a tabular format (in particular when dealing with CORBA IDL elements and their C/C++ implementations for instance.) Another advantage is that sometimes (again, sometimes) you can take those elements, and massage them into a DOT file from where to generate a graph of sorts.
Similarly, I've generated DOT-based graphs of file dependencies, object dependencies, etc. You can run nm or objdump to generate a list of "things" included in the obj files, and generate a sort of component dependency graph.
But the cheapest way to go about it, if you are using the GNU compiler suite, is to use gprof. If you have a set of test cases that can exercise a substantial portion of your code, you might be able to get a partial call graph. The call graph might be dependent on the test scenario, but it is something that can get you going in the right direction.
Sometimes a code coverage tool like lcov, running in tandem with gprof, can help as well. It might give you an indication of dead code (if your tests are comprehensive enough) or code that still needs inspection (if your tests are not comprehensive.)
It is all manual and thus prone to error. That is the price of not using a good code navigation tool (which unfortunately the ones worth a spit are commercial-based.)
But with some good elbow grease, spit and diligence, you can go a long way by clobbering something together using existing tools (gprof, lcov, nm, objdump, grep, awk, sed, tr, perl, excel, etc.)
The only thing I could find was source navigator NG, but I have zero experience with it.
I find that the best way to read and understand someone else's code is to comment it.
I've always been a little ADD and impatient with having to do things systematically, unfortunately I found this is the only approach that works for me. If I don't do this, I end up staring at the screen for long periods of time and not getting anywhere. So, I wrote up a worksheet to help myself in these situations, here are the steps I identified as helpful to deal with massive levels of complexity in unfamiliar code:
1. Establish a clear goal and sub-goals.
2. Use the goals to determine the scope of your reading.
3. Allocate quite a bit of time in large chunks.
4. Identify key layers of abstraction.
5. Enumerate classes (functions, namespaces) of interest.
6. Systematically, read through each class superficially.
7. Pick 8 classes to focus on.
8. Do a deep dive.
9. List/sketch inputs and outputs in terms of function names, types referenced.
10. Look at relevant tests for usage as needed.
11. Check off each class once looked through.
12. Measure the complexity of a component by how many checks are required for full understanding.
13. Iterate until goals achieved.
There is a good side to reading another programmers code, which is in seeing other methods and approaches to solutions. For any given task, there are many ways in which it can effectively be programmed for. Grant 90% of solutions done are usually inefficient, unstable, bloated, or just outright wrong, but out of that other 10% you can find some interesting things, even if it isn't your preference or end choice. The main thing is to be open-minded. A programmer that believes their way is the only good way (which is most programmers) is a programmer that expands their knowledge and capabilities very slowly.
As for being able to read it, I would strongly suggest customizing a plug-in to your favorite IDE and have it reformat the code. Every programmer codes with their own style, and while you wish everyone used correct indenting and such, the reality is far from that. If you want to save yourself a lot of headache and time, just auto-reformat each page as you go through it. This not only will make it so you can save the source code with good formatting, but will also make it much easier for you to read and review. At least that's my two bits, which in today's economy counts for next to nothing. 8-) I've been in your position too many times, and any way you look at it, you've got a bunch of scanning through files ahead of you. Good luck.
There's no trick. Basically you slog through it. You figure out the structure by spending time studying it. QED.
Basically it's up to YOU do decide the effort is worth it.
I generally try to get some class diagrams and such using doxygen or some such tool. This at least gives you a nice sorted and cross referenced index into the code with class herarchy diagrams.
um..I forgot how to read Visual Basic. I even forgot how to type code in VB 6. and basic is supposed to be easy to learn. lol
i remember C though. weird
I need to find my old book on C too
#include /* main function */
int main()
{
printf( "Hello World!\n" );
getchar();
return 0;
}
Which others? There aren't any. Otherwise he wouldn't be reviving it.
I'm doing this right now, going through JavaScript/Dojo code that was written by various people that are no longer with the company. It's a lot to absorb for me, because although I've worked with several JavaScript frameworks, Dojo is a little different than all of them. My strategy is to use lots of console.logs and get an understanding of the flow. I've done this pretty much on every job I've ever had - reverse engineered code. It's like climbing a mountain - it's daunting at first, but the higher you climb, the more you can see around you. You can look at this code all day, and at some point you may be like, "WTF is going on?", but your unconscious mind will work on it while you sleep, and the next day you'll wake up knowing a little bit more than you did the day before.
Just keep at it. Yes, marijuana helps in the beginning to focus on the gibberish, but I don't seem to need it anymore.
You should use a script to strip out the comments. If any are actually present, they are almost certainly misleading.
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
Try using a reverse engineering tool like Understand for C+ -- this tool gives you many different views of the code like call graphs, data dictionaries, class diagrams, cross references, ... Another approach would be to generate class diagrams with one of the OO design tools like IBM/Rational Rose or Rhapsody, or even Microsoft Visio -- these tools will reverse engineer desing (class diagrams) directly from source code.
... by hand.
Everyone has their own coding style and assuming the original code is not in a style that you "like", recoding it all by hand is a great way to learn the code!
Documentation is often out-dated and simply incorrect! :(
By reformatting the code, you "touch" and "read" almost every line of code ... it takes time, obviously, but you'll learn a lot as you go!
Get a good global search utility. Search for every use of the symbol names that you are working with, to see what their scope is and how they are used. You can see how the other code uses them, steal snippets of working code you need, and tell what might "break" your use of them. If you have seen every use of a symbol, in the whole code set, then you can feel more confident that what you are doing will not break things.
Don't try to learn everything at once, your brain can't hold it all right away. Study about the modules and symbols that you must work on. Start with finding out how much you actually -do- have to work on. Then widen your scope as you find more "connections" to other things.
Everything is more complicated than you think, and more complicated than your boss thinks. Be cautious, go carefully, and don't go "running full speed off a cliff". This will actually end up being much faster to a working version.
Frustration is normal, when learning something new. It does not necessarily indicate that the thing you are studying is screwed up (although it might be). Work through the frustration before making changes. (Except, it might be good to make temporary "test" changes to help in the learning.) It is painful, but you will learn and grow from it.
Strongly resist the temptation to re-write stuff that is new to you, no matter how bad it looks. There will usually be good reasons that it is that way, and you will make a disaster. Make good backups of everything before each change and at least every day. There will be places where you must throw away what you have done and go back.
On the other hand, don't be afraid to make global changes to the code set if it is really needed. Just back up a working copy that you can go back to, and be careful.
Do things in phases, even if the "customer" only wants a final version. Make a reasonable set of changes than get it working and debugged. Then do it again for the next changes. Debugging as you go is actually much faster in the long run. Learn to use a Debugger, it is like turning on a light in a dark room.
Find quiet time for several hours to work on it, so that you can get into a "state of flow" mentally. This is much more effective than normal work with interruptions.
HTH.