Slashdot Mirror


When Making a Comprehensive Retrofit of your Code...

chizor asks: "My programming team is considering making some sweeping changes to our code base (150+ perl CGIs, over a meg of code) in the interest of consistency and reducing redundancy. We're going to have to make some hard decisions about code style. What suggestions might readers have about tackling a large-scale retrofit?" Once the decision has been made for a sweeping rewrite of a project, what can you do to make sure things go smoothly and you don't run into any development snags...especially as things progress in the development cycle?

4 of 385 comments (clear)

  1. Re:object orientation by apsmith · · Score: 5, Informative

    Alright, maybe I posted a little too soon, but shouldn't "flamebait" be attracting flaming responses? I don't see any...

    Anyway, if I'd spent a little more time thinking about the advice side of it, taking a look at appropriate programming methodologies (like Extreme Programming advocated in another thread here) would be one piece I'd advocate. Given the size of the code (1 MB = about 20-30,000 lines?) there's no need for major heavy-weight processes here. More important I'd say is sitting down and figuring out in the appropriate level of detail what exactly your system is doing right now - you can do this using UML diagrams which seems to be becoming a standard, though the main use we've found is to try to get an overall view of things which we then throw out when we get into the details again.

    The other thing to do along these lines is look for your use of standard patterns within your code - the Design Patterns book is extremely helpful if you're moving to an object-oriented framework at all; following well-known patterns and indicating clearly what you are doing can make your code much easier for others to follow.

    --

    Energy: time to change the picture.

  2. Compartmentalize & Destroy by DotComVictim · · Score: 5, Informative

    1) Identify common functionality.

    2) Encapsulate in libraries

    3) Be sure to extract enough generality that you don't have special case functions

    4) Don't extract so much generality that functional interfaces become unwieldy.

    5) Write everything in the same language.

    6) Find any complex pieces or algorithms. If they can be simplified or re-written, do it. If not, save it so you don't need to debug it again.

    7) Throw everything else away.

  3. modularity/incremental rollout/unit tests/iterate by tim_maroney · · Score: 5, Informative

    I second all that has been said about making sure that you really need to do this and that it is worth the time and risk. One sign that you may need to do so is an excessive reopened bug rate, where fixing one bug often creates another bug due to side effects and component interactions. If you decide that it is, then the three keys to success will be modularity, incremental rollout, and unit tests.

    Modularity is probably what you're already thinking about. Go over the old code base, in a code review, and find where the same thing is done over and over either with copy-and-paste code -- the bane of crap engineers -- or with different code that serves the same ends. Look for repeated sequences in particular. Create a new library that encapsulates those pieces of code.

    Incremental rollout is vital. Only replace small parts of your system at a time, doing complete retests frequently. Don't write a new encapsulated routine and then roll it out to each of the three dozen places in which it appears in the whole code base. Write the whole function library, with unit tests, and then start applying it to separable modules one by one, retesting as you go. Otherwise I guarantee the whole thing will fall apart and you won't be able to tell why. Ideally, you might set a threshold on the rate of replacement of old modules and work primarily on creating new modules with the abstracted logic.

    Unit tests are crucial because, as noted, the messiness of your old code probably conceals a lot of necessary logic. We had this great phenomenon on Apple's Copland where people who had never used the old OS managers were rewriting them in C or C++ from the assembly source. When they saw something in the assembly they didn't understand, they just ignored it. Guess what -- the new managers didn't have any backwards compatibility. The only answer to this is to have a thorough unit test for any module that you replace, against which you can test the new version. This also confers other quality benefits, but during a rewrite it's critical.

    Finally, once you have replaced a significant number of your modules, you will find that new levels of abstraction appear. The average size of each function or method will have shrunk considerably, and now it becomes possible to see new repeated code sequences that were not visible due to the old cruft. Move these into your new library modules and start using them in continuing replacement work. In addition, start going back -- slowly and incrementally -- through the already converted modules and replacing the repeated sequences with calls to the new abstractions.

    Finally, figure out how you got into this mess in the first place. The worst programmer habit I know of is copy-and-paste coding instead of using subroutines. You can tell people not to do it, but some always will. Those people should be bid farewell -- you can't afford their overhead. Other common problems include lack of planning and review, a code first and think later mentality. Start moving your organization up the levels of the CMM and you may find that you wind up with fewer modules that need replacement.

    Hope this helps.

    Tim

  4. My suggestions by Pinball+Wizard · · Score: 5, Informative
    My programming team is considering making some sweeping changes to our code base (150+ perl CGIs, over a meg of code


    First of all, I think its important to realize that you have a medium-sized website and not a big software project. Therefore, some of the above comments recommending refactoring, UML, and eXtreme programming may be a bit overkill.


    Web programming != software development! Its usually done at a much faster pace. Even if an object-oriented approach is taken, you are still probably talking about simple function libraries rather than complex C++ or Java classes. Again, overkill.


    150 files is still a small enough project to be managed by one or two decent coders. Actually, I just looked at the amount of stuff I've written over the years for my online bookstore and its more like 500 files and over 4 megs of code. I don't feel like its too much of a job to manage this codebase by myself.


    So, here are my recommendations.


    You probably have gotten better at programming since the time you started your project. Take a few of the most recent CGIs you have written and compare them to the first ones you wrote. You just might notice a glaring difference in the quality. Also, the first pages you wrote are likely to be among the most important in your project, yet they are also likely the worst quality-wise.


    Regardless of what language you program in, I think its important that you can tell whats going on in the program by reading the comments. If a manager can understand what a program does by reading the English bit, there's a good chance other programmers will be able to jump in and help as well. One specific rule I also follow: if you do regexes, say IN ENGLISH what those regexes do. I say this because regexes are one of the hardest things to read.


    Look for any code that can be "factored out" of your scripts and put those into function libraries. Then include those in your program. The only problem with this occurs when you have huge function libraries that slow down your scripts when you include them. In that case you would logically separate your functions into different files. I have included very common functions in different include files, so I can make the actual code compiled or interpreted as small as possible.


    Consider using a flowcharting tool as an aid to programming and/or documenting your code.


    Standardize how you name variables and functions, write comments, identation, and spacing.


    Be sure and include the date you write your scripts in the comments, in case the filesystem wipes this out.


    I'm sure theres other things I've left out, but following the above guidelines have helped me do exactly what you are trying to do: manage a growing codebase. But don't forget, this is web programming, not rocket science, and some of the above suggestions may be more trouble than they are worth. Keep it simple.

    --

    No, Thursday's out. How about never - is never good for you?