Slashdot Mirror


Learning and Maintaining a Large Inherited Codebase?

An anonymous reader writes "A couple of times in my career, I've inherited a fairly large (30-40 thousand lines) collection of code. The original authors knew it because they wrote it; I didn't, and I don't. I spend a huge amount of time finding the right place to make a change, far more than I do changing anything. How would you learn such a big hunk of code? And how discouraged should I be that I can't seem to 'get' this code as well as the original developers?"

21 of 532 comments (clear)

  1. Time by wmbetts · · Score: 5, Interesting

    If you don't have access to the original developers and they didn't document it you're going to just have to spend a lot of time reading the code. =\

    --
    "Ubuntu" -- an African word, meaning "Slackware is too hard for me". - stolen from Dan C alt.os.linux.slackware
    1. Re:Time by Anonymous Coward · · Score: 5, Insightful

      Everyone, including me, always wants to go for the clean rewrite. But in my experience it almost never turns out for the best. There's a reason for all that messy code. Much of it was bug fixes that real-world users needed. Other complexities were needed in the first place to make the user experience simple (natural, giving it that "hey, it's just works like I expected" feeling).

      The reason you don't understand the code is that you weren't part of the original design discussions, in which weeks or months were spent learning, debating, arguing, etc., about many different design decisions at many different levels of abstraction. You don't know why the trade-offs were made. You just see the finished product.

      Rewriting the code won't give you insight into any of this. Learning the code the hard way, fixing bugs, rewriting *small* pieces and seeing what breaks the regression tests, etc. will eventually help you to understand it.

      There is no point in rewriting it before you fully understand it. Attempting that can kill a product. Conversely, by the time you fully understand it, there won't be any need to rewrite it, because you'll own the code.

  2. don't feel bad at all by iggymanz · · Score: 5, Insightful

    So you have been handed the steamin' pile o' code, it is great that you are very cautious and deliberate when modifying it. Make a set of regression tests, that is, make a set of test data and procedures and expected results to ensure original functionality that is still desirable is still working and no other errors introduced. It is hard, much more tedious than just creating new code with few constraints.

    1. Re:don't feel bad at all by kaiser423 · · Score: 5, Insightful

      Definitely what parent said. Also:

      I have inherited huge code bases. I actually kind of like it. Lots of people whom I thought were idiots, and cursed their code, I later found out that they were quite smart. Others, I found that they just thought about problems vastly different than I, and learning how they tackled problems gave me many more tools in my personal arsenal.

      That said, find a big wall or something. Use a debugger or code analysis tool to find the main execution paths (what calls what and when, etc). Diagram that up on the wall really large. Then use the tools to determine when and why certain auxiliary functions get called. Diagram that up, and you'll start getting a spider on your wall. Go from there using your new understanding to re-arrange the program flow not in terms that make sense to you, but rather seem to be how they are programmed (functional, objective, some pattern). Rinse and repeat until you know pretty much what the code is trying to accomplish in 90+% of the situations, and it's general plan for attack.

      With that diagram, dive in! There's tons of little details in every function that look useless but are usually bug fixes. Use a scalpel, not a hatchet.

      I was deployed remotely with no way for the main programmer to get at me. We had prepared 9 months to collect 4 minutes of data, and the test wouldn't wait for us. I found an odd bug hidden somewhere in ~22k lines of code. I did this over a weekend, and found about 4-5 nasty bugs that were combining to produce what I was seeing, and fixed them. I did this with zero input or help, over a weekend in code I had never seen spread around about 60 files. I spent the first half day just diving in and trying things, and nearly shot myself. That's when I went high-level and dug in from there.

      When that was done, I the took over code maintenance and updates on that project. The other guy had wrote it 100% himself, but because after that exercise I knew the code better him. Sometimes being new is good; you don't have all that cruft of implementations that didn't work, etc, but still linger in the original programmer's head.

  3. Use Doxygen by gbrandt · · Score: 5, Insightful

    Doxygen is your friend. run it over the source code and keep the HTML handy for searches and cross references.

  4. It depends on the language by $RANDOMLUSER · · Score: 5, Funny

    If it's Perl or VB, you might want to consider self-immolation as a first step.

    --
    No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    1. Re:It depends on the language by martin-boundary · · Score: 5, Informative

      No, he meant that as an actual offering to the Perl God, Quetzal$@[&shift]L. It's a bloodthirsty god, who never sends the Divine Debugger without at least two pints of the red stuff. I would have immolated a coworker, but the parent poster seems to have been alone in the room :-/

    2. Re:It depends on the language by chill · · Score: 5, Funny

      No, he meant that as an actual offering to the Perl God, Quetzal$@[&shift]L. It's a bloodthirsty god, who never sends the Divine Debugger without at least two pints of the red stuff. I would have immolated a coworker, but the parent poster seems to have been alone in the room :-/

      The fact the above comment is +5 Informative and not +5 Funny makes me very glad I stopped programming in Perl when I did.

      --
      Learning HOW to think is more important than learning WHAT to think.
  5. Not lots of code by www.sorehands.com · · Score: 5, Insightful

    First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.

    To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such. Run some sort of profiler or flowchart type program on it to get a high level view of the code and how it fits together. If you can get the person(s) who worked on it before you to give you an idea of it fits together.

  6. Hunt down the original developer by Anonymous Coward · · Score: 5, Funny

    (And then shoot him.)

    1. Re:Hunt down the original developer by ottothecow · · Score: 5, Funny

      shouldn't that be more like shoot(huntdown(first(developers)))?

      --
      Bottles.
  7. Not at all. by hemorex · · Score: 5, Insightful

    I find that if the other programmer wrote it in such a way where it's too complex for me to follow, I'm not the one who's a moron.

    1. Re:Not at all. by tsm_sf · · Score: 5, Insightful

      Man, always when I run out of mod points.

      Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.

      --
      Literalism isn't a form of humor, it's you being irritating.
  8. Visualisation by gilleain · · Score: 5, Informative

    Anything ranging from just sketching out some informal package diagrams on some paper (I quite like using an A3 sketchpad) to something more like Code City which can work with code in smalltalk, java, and c++. There are UML diagram makers, of course, but automated diagrams like that probably need to be edited.

    In fact, it is not the finished diagram that helps so much as the drawing of it, which is why paper and pencil is so good. Or a vector graphics package.

  9. Re:30 to 40 thousand lines isn't large by any meas by istartedi · · Score: 5, Funny

    Very well, sir. Here's your 40,000 lines of Perl from the late 90s. It's mostly regex to parse revisions 30 through 451 of our in-house provisioning system. Oh, and BTW don't screw up like the last guy who had this job. He provisioned 32767 customers with tier-1 service, and it was the director's job to explain why we either had to let them have it for the remainder of the year, or else deal with the CR issues.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  10. Try to learn the structure by phantomfive · · Score: 5, Insightful

    I had an English professor who always said, "Structure is the key to understanding." He was talking about literature, but I think the same is true for programs as well.

    Try to understand the structure of the program. What is the basic flow? It should have an initialization routine, a main loop, and a shutdown routine. Find out roughly where they are, then focus on the main loop. Usually there will be one piece of code that is central, and it will occasionally pass control into other large pieces of the program. Sometimes there will be more than one main loop, and control switches back and forth between the various main loops. If the program is event drive, this will make a difference in the structure.

    If you are just trying to make a small change, try to find the sequence of events that will lead up to where that change needs to be made. Follow the sequence of execution until you get to the line you need to change. If you are changing a single variable, sometimes it's helpful to do a search and find all the places that variable is used, to make sure your change won't have any side effects. This may seem time consuming, but it can save 10 times more in debugging.

    Learn to follow code execution with your eyes, without running a debugger. One thing that separates good coders from not so good coders is the ability to follow code that isn't being executed.

    --
    Qxe4
  11. Re:Large? by snowgirl · · Score: 5, Interesting

    Are you Microsofties really so stupid and ignorant that you're not aware of the ports of GNU utilities to Windows or Cygwin or even your own company's Interix and Services for UNIX products?

    No, but to explain this, I need to give you some background.

    When I joined Microsoft, I hadn't used any version of Windows at all for any reason other than playing games. After joining Microsoft, I never used Windows at home for any purpose other than logging into the VPN to work from home... and since I did not even have an x86 machine, this required using Virtual PC on my Mac OSX box.

    Now, I know of all of these tools, and I even could install GVim on the machine as well. However, I was working in a Build Group. This required me to occasionally log into 100 different machines at once in order to start the build process for WinXP/Server 2003. Most of these machines require no more input than logging in and starting up a single app... thus no reason to install special software on them.

    Then, something would break, and I would have to read logs, and/or code on the actual box that had the exact problem. Spending an hour installing apps to do my job would be an unacceptable use of my time, and delay the build unnecessarily.

    I learned to use the tools that were available with the environment that I was in. Thus, I did almost all of my programming at Microsoft in notepad.exe, and I'm not kidding you.

    Were I in a different group? The results could have been different... but having 100 different machines, most of which I didn't have admin rights to, meant that even just installing Notepad++ or something like that would have been a waste of time.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  12. Re:Large? by snowgirl · · Score: 5, Interesting

    What the hell? Are you serious?

    So Microsoft themselves hired you to work on Windows, although you were a Mac user and had absolutely no real experience with Windows?

    Not only that, but you had to manually log in to hundreds of systems just to run a script? They didn't push for this to be automated, and you tossed back on the street where you belong? What the hell?

    Don't get me wrong, I don't doubt that your story is true. It's the sort of shit that we should expect from any large company, especially Microsoft. Please tell me you're an H1B, though. At least then it'd make some sense why they'd hire you. H1Bs typically aren't worth more than a batch file.

    Yeah, it took me about a month before I understood that my entire group would be replaced by a few scripts in the Open Source world.

    The primary problem was that because the source code was not a "product", the build code was so full of holes and edge-cases and hacks, that it broke almost constantly, and required someone to babysit it for the whole 14-some hours that it takes to compile.

    Actually, in my orientation class, we went over patents, copyright, and trademark, and I knew it all, and the teacher asked me how I knew so much, and I told her that I owned a registered copyright on some GPL code, and she was like, "and your managers hired you knowing that?" And I was like, know about it? It's the only reason I got hired by Microsoft... be damn sure I didn't submit a resumé.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  13. Re:30 to 40 thousand lines isn't large by any meas by benjamindees · · Score: 5, Funny

    Perl is like the matrix. At a certain point, after you've stared at it long enough, it all just makes sense.

    --
    "I assumed blithely that there were no elves out there in the darkness"
  14. As a maintenance programmer by npsimons · · Score: 5, Informative

    As someone who has done probably 90% of his work in maintenance programming, let me give you my tips:

    • Snapshot what you get - don't change it, don't even look at it. As soon as you get it, check it in, binaries and all, to a change tracking system (eg, CVS, SVN, etc).
    • Now that you know what they gave you, and you can get back to it at any time, your options are seemingly limitless, but for the quickest way to get up to speed, I would recommend writing unit tests for the software. This will be long and tedious, but by writing unit tests you will a) learn what to expect out of the software, b) be able to tell when you break something and c) truly learn the software.
    • Automate, automate, automate! It's a close call as to whether you should start right away on your first unit test, or get the build system automated, but let me just say that it will save you a ton of time to have a "one button push" way to build, run and test the software. From there, you should be having your machine build and run the unit tests automatically, preferably nightly, from a clean checkout of the repository, just in case you forget to run a test after you change something or you forget to check something in.
    • Run the software (including unit tests) through the gauntlet - valgrind's memcheck, electric fence, fuzz, bfbtester, rats, gcc's -fstack-protector-all flag, libc's MALLOC_CHECK_=3, gcc's _FORTIFY_SOURCE=2 define, gcc's -fmudflap flag, gcc's -Wall -Wextra and -pedantic flags; any way you can think to flush out bugs, do it, and start fixing them; you will learn much, not just about the code, but about the thought process of the original coder(s) this way. Change tools as appropriate for your programming language and environment (including compiler/interpreter, libs, OS, etc). As you can tell, I do a lot of C and C++ programming.

    BTW, the fact that you have a hard time understanding this code may be more a reflection on the original authors' coding skills than on your abilities; any idiot can write code that "just works"; it takes a lot of thought, time and effort to write code that is maintainable, and more often than not, the original coders were short on at least one of those (if not all three). Here's hoping you have the time to follow my above tips; they take a lot of time, but can be worth it if you really need to maintain the code. It's funny to note that apart from the first one, most of those tips apply equally well to developing software from scratch. If the code already has a change tracking system, unit tests, a build/run/test system, *and* automated testing, consider yourself lucky and just start picking apart the unit tests.

  15. My Dick is Bigger than Your 250,000 lines of code by BlueBoxSW.com · · Score: 5, Interesting

    Really. A guy asks a question for help and all of these people keep telling him 30-40,000 lines of code isn't much.

    That's a lot of code to get your arms around if you didn't write it. It's not the end of the world, but it is a sizeable task, and is the type of topic that few professional journals or books will ever be written about.

    Having been in similar situations, I my advice would be:

    1) Try to get an understanding of the history of the code. Who wrote it? Why? How many developers? How long has it been around? Do people love it or hate it? Is there a version control system in place you can use for information?

    2) Look at it from a technical viewpoint. Is is complete? Does it compile and run? How many languages are used? Are there interfaces with other systems you need to know about? What dependancies are there? How easy is it to setup a test server? What parts are well coded? What parts stink up the joint?

    3) Dig for functional documentation. What does it do? For whom does it do it? What business needs does it support? How mission critical is it?

    4) Meet with the business owners. Seriously. This helps you do two things: #1-- Define the real business need (which may be different than what was understood by the previous developers), and #2-- Set appropriate expectations about maintenance. You'll work hard to maintain and keep it working, but you are working from a disadvantaged position. It is important they know this and support you in your efforts, rather than complain loudly when something doesn't work.

    5) Plan to remove the dead weight. There's always a lot of dead weight in these near-abandonded projects. Get an idea how to simplify things and plan your work in phases.

    6) Setup real test and development servers. Yeah, you know that wasn't already done.

    7) Use version control. But you know this. It's 2010, and no developer worth his/her salt would code a paying project without version control. Right?

    8) All fixes will take much longer than if you wrote the code, so be careful with estimating time.