Slashdot Mirror


Learning and Maintaining a Large Inherited Codebase?

An anonymous reader writes "A couple of times in my career, I've inherited a fairly large (30-40 thousand lines) collection of code. The original authors knew it because they wrote it; I didn't, and I don't. I spend a huge amount of time finding the right place to make a change, far more than I do changing anything. How would you learn such a big hunk of code? And how discouraged should I be that I can't seem to 'get' this code as well as the original developers?"

9 of 532 comments (clear)

  1. Visualisation by gilleain · · Score: 5, Informative

    Anything ranging from just sketching out some informal package diagrams on some paper (I quite like using an A3 sketchpad) to something more like Code City which can work with code in smalltalk, java, and c++. There are UML diagram makers, of course, but automated diagrams like that probably need to be edited.

    In fact, it is not the finished diagram that helps so much as the drawing of it, which is why paper and pencil is so good. Or a vector graphics package.

  2. Re:It depends on the language by martin-boundary · · Score: 5, Informative

    No, he meant that as an actual offering to the Perl God, Quetzal$@[&shift]L. It's a bloodthirsty god, who never sends the Divine Debugger without at least two pints of the red stuff. I would have immolated a coworker, but the parent poster seems to have been alone in the room :-/

  3. Re:30 to 40 thousand lines isn't large by any meas by Garridan · · Score: 4, Informative

    Oh yeah, well I just inherited a code base of 2.8 trillion lines of assembly code, and I have to read it over a 12.734 baud VAX connection! Why, back in my day...

    Anyway... I've taken on a few large-scale software projects before, and my approach has always been "read twice, hack once". I agree with the the parent, and I'll add a note: for the love of everything sacred and unholy, use revision control, and don't trust it -- that is, back up incessantly. Document the hell out of your process. Once you've really learned the system, you might want to back out some of the newbie mistakes that you're making right now.

    And yes. Learning a big system takes a lot of time -- you should be reading much more than writing until you've learned it. I find it helpful to diagram dependencies / draw up finite state machines.

  4. Divide and Conquer by Whomp-Ass · · Score: 4, Informative

    Identify each major portion of functionality. If you are working with a sales/billing system you would probably end up with : Orders, Invoices, Payments, Admin.

    Go through each of those portions and identify the major portions. Orders: Order headers, Order details, business logic, ui logic, reports, datalayer, etc. Repeat until reduced into easily consumable units.

    Pick and stick to an SDLC. Use whatever fits the situation and the resources. For a small project (under 100k lines of code) you should be good by yourself. Anything more and you'll have to involve at least 1 other person for testing. For medium (100k-500k lines) you'll need an additional dev...For large projects (500K-5M lines) you'll need a project manager, lead dev, 2 devs, 1 test, and a UAT team...For larger projects you'll have something unique and frightening to the specifics of the software project and corporation/agency making it...anyway, I digress...

    Go through each subdivision line-by-line and re-write it yourself (even if you aren't going to put your re-written version into production); the only way you're going to truly understand what is going on is if you do it yourself. Use whatever language you are most comfortable with or is most appropriate to the task (or languages), it does not need to be the same as the original.

    Verify that for a given input, your version produces an exact output.

    Take a deep breath. It's not a race. It's a one-to-one functional mapping of your software (your mindspace) and the original software (the other developer(s) mindspace(s)). The code probably will not be straight forward. It has also been battle-scarred and will be warty. Changes of initial requirements through time and feature enhancements (feature creep) will have taken it's toll on what may have originally been something simple or even elegant. It's something of a niche mindset and if it is not for you, there exist many other exciting things to be programming.

    Ultimately, if you do as outlined above, you'll solve many problems, be able to make whatever changes you like, and in so doing have a way to present your design as a replacement if you want...Or not, if you don't; for 30-40k lines parallel development makes sense, in a way, for one person.

  5. Re:Use it by mosb1000 · · Score: 3, Informative

    Not without variables, but without unnecessary ones. For example, someone might write:

    int a;
    int b;
    int c;
    int d;
    int e;
    int f;
    int g;
    a = dropBox1.Value;
    b = dropBox2.Value;
    c = dropBox3.Value;
    d = dropBox4.Value;
    e = a + b;
    f = c + d;
    g = e * f;
    result.Value = g;

    While I would write:

    result.Value = ( dropBox1.Value + dropBox2.Value ) * ( dropBox3.Value + dropBox4.Value );

  6. As a maintenance programmer by npsimons · · Score: 5, Informative

    As someone who has done probably 90% of his work in maintenance programming, let me give you my tips:

    • Snapshot what you get - don't change it, don't even look at it. As soon as you get it, check it in, binaries and all, to a change tracking system (eg, CVS, SVN, etc).
    • Now that you know what they gave you, and you can get back to it at any time, your options are seemingly limitless, but for the quickest way to get up to speed, I would recommend writing unit tests for the software. This will be long and tedious, but by writing unit tests you will a) learn what to expect out of the software, b) be able to tell when you break something and c) truly learn the software.
    • Automate, automate, automate! It's a close call as to whether you should start right away on your first unit test, or get the build system automated, but let me just say that it will save you a ton of time to have a "one button push" way to build, run and test the software. From there, you should be having your machine build and run the unit tests automatically, preferably nightly, from a clean checkout of the repository, just in case you forget to run a test after you change something or you forget to check something in.
    • Run the software (including unit tests) through the gauntlet - valgrind's memcheck, electric fence, fuzz, bfbtester, rats, gcc's -fstack-protector-all flag, libc's MALLOC_CHECK_=3, gcc's _FORTIFY_SOURCE=2 define, gcc's -fmudflap flag, gcc's -Wall -Wextra and -pedantic flags; any way you can think to flush out bugs, do it, and start fixing them; you will learn much, not just about the code, but about the thought process of the original coder(s) this way. Change tools as appropriate for your programming language and environment (including compiler/interpreter, libs, OS, etc). As you can tell, I do a lot of C and C++ programming.

    BTW, the fact that you have a hard time understanding this code may be more a reflection on the original authors' coding skills than on your abilities; any idiot can write code that "just works"; it takes a lot of thought, time and effort to write code that is maintainable, and more often than not, the original coders were short on at least one of those (if not all three). Here's hoping you have the time to follow my above tips; they take a lot of time, but can be worth it if you really need to maintain the code. It's funny to note that apart from the first one, most of those tips apply equally well to developing software from scratch. If the code already has a change tracking system, unit tests, a build/run/test system, *and* automated testing, consider yourself lucky and just start picking apart the unit tests.

  7. Re:Large? by snowgirl · · Score: 3, Informative

    Most of these machines require no more input than logging in and starting up a single app... thus no reason to install special software on them.

    Then, something would break, and I would have to read logs, and/or code on the actual box that had the exact problem. Spending an hour installing apps to do my job would be an unacceptable use of my time, and delay the build unnecessarily.

    "Then something would break" contradicts the earlier statement "no more input than logging in"

    The fact that something is likely to break, and you will need to troubleshoot it, should be reason enough in itself to install some (small) convenient, unobtrusive troubleshooting tools, as standard practice, and as part of the standard initial installs for those servers, to make troubleshooting faster and not require software installations or elaborate practices when things do break.

    You missed a part before the quote that you pulled out. "Most of the machines required no more input".

    My statements remains consistent and not contradictory when only 2 machines typically need direct interfacing.

    And small convenient, unobtrusive troubleshooting tools WERE installed as standard practice on the machines... I already said that there was dir /s /b, and findstr... do I have to have "find" and "grep" when I had tools with the same functionality?

    When I started off, there was a big learning curve because of the new tools, but by the time I left, it was as second nature to me as was find and grep when I joined.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  8. Re:Large? by benjamindees · · Score: 3, Informative

    At my job at Microsoft, we were in the support end of the core os group.

    Windows doesn't really have find and grep, but it does have "dir /s /b [pattern]" and "findstr /sipc:"[pattern]""

    When I joined Microsoft, I hadn't used any version of Windows at all for any reason other than playing games.

    I did almost all of my programming at Microsoft in notepad.exe

    it took me about a month before I understood that my entire group would be replaced by a few scripts in the Open Source world.

    Dear lord, this is the most hilarious thing ever posted to /.

    --
    "I assumed blithely that there were no elves out there in the darkness"
  9. Re:Use it by ciggieposeur · · Score: 4, Informative

    What do you think about intermediate variables that are not strictly necessary?

    My general rules of thumb:

    1) I don't care how many variables are declared, so long as each makes sense on its own. Like another poster's example, 'fullName' is perfectly acceptable (especially for i18n/l10n aware code that may have different rules for generating a name).

    2) I ABSOLUTELY HATE clever arithmetic / pointer arithmetic / expressions all crunched into one line that can be split out. Example: in C-like languages that support pre- and post-increment, I expect the code to use only one or the other consistently, and never mix it with another expression. So this is fine:

    i++;
    j = i + 4; ...but this I can't stand:

    j = ++i + 4;

    #2 I picked up from a very experienced developer who pointed out that making the code harder to read is never worth it, the compiler produces the same code as the easy-to-read version. And that making code that looks 'too easy to be clever' is quite a bit harder than making code that looks 'too clever to always work'.