Slashdot Mirror


When Making a Comprehensive Retrofit of your Code...

chizor asks: "My programming team is considering making some sweeping changes to our code base (150+ perl CGIs, over a meg of code) in the interest of consistency and reducing redundancy. We're going to have to make some hard decisions about code style. What suggestions might readers have about tackling a large-scale retrofit?" Once the decision has been made for a sweeping rewrite of a project, what can you do to make sure things go smoothly and you don't run into any development snags...especially as things progress in the development cycle?

21 of 385 comments (clear)

  1. object orientation by apsmith · · Score: 3, Informative

    we're doing something similar - and switching to java (JSP's + Tomcat, struts) to replace a lot of old perl cgi's. The java code is much, much cleaner. But object-oriented perl code can help if you don't want to take the plunge too far into a new language. And at least find a way to go mod_perl rather than CGI, for the things where performance matters at all.

    --

    Energy: time to change the picture.

    1. Re:object orientation by apsmith · · Score: 5, Informative

      Alright, maybe I posted a little too soon, but shouldn't "flamebait" be attracting flaming responses? I don't see any...

      Anyway, if I'd spent a little more time thinking about the advice side of it, taking a look at appropriate programming methodologies (like Extreme Programming advocated in another thread here) would be one piece I'd advocate. Given the size of the code (1 MB = about 20-30,000 lines?) there's no need for major heavy-weight processes here. More important I'd say is sitting down and figuring out in the appropriate level of detail what exactly your system is doing right now - you can do this using UML diagrams which seems to be becoming a standard, though the main use we've found is to try to get an overall view of things which we then throw out when we get into the details again.

      The other thing to do along these lines is look for your use of standard patterns within your code - the Design Patterns book is extremely helpful if you're moving to an object-oriented framework at all; following well-known patterns and indicating clearly what you are doing can make your code much easier for others to follow.

      --

      Energy: time to change the picture.

    2. Re:object orientation by Anonymous Coward · · Score: 1, Informative

      Good point about object orientation.
      It's often a good thing to take a couragous step like this and rewrite & refactor code (unfortuneatly management tends to see it as unproductive time, but it can pay off in the long run)

      Now my main point: I can't imagine having to deal with megabytes of Perl code. It's hard enough to figure out the intent of a simple perl script I wrote a few months ago, I can't imagine having to slog through MEGABYTEs of the stuff. Perhaps while your at it you should take another couragous step and rewrite in another language.
      May I suggest something like Ruby: it feels pretty familiar to Perl folks and its quick to learn. Ruby's object oriented-ness certainly would beat trying to create an OO system in Perl (it makes my head hurt just to think of OO perl!)
      Check out: http://www.ruby-lang.org
      or http://www.rubycentral.com
      for more info!

  2. Don't do it by mrpotato · · Score: 3, Informative
    See the recent Joel Spolsky interview here, that was discussed on /. here.

    Basically, Joel's take on a similar problem is: don't do it.

    Unless you have a _really_ good reason to do huge change to a big codebase, don't bother, and make something more productive instead.

    --

    cheers
  3. Re:Sleeping dogs by kz45 · · Score: 3, Informative

    Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.

    This is a lesson to be learned. Engineer your code from the beginning. Use easy to understand commenting, and strucutured code. Although it takes some discipline, you will almost never have to reconsider "re-writing from scratch".

  4. Horror Story by Brontosaurus+Jim · · Score: 3, Informative

    My firm went through this sort of thing just two years ago. The PHB at the time decided, for some reason, that our 300,000 lines of semi-poorly written C code, and 50,000 associated lines of Java (Dont' ask).

    Anyway, it took 7 of us over 2 months to get even halfway done. The pressure the boss was putting on us was awful, and he didn't really even understand what we were doing, even though he was the one demanding it. I think she read it in a trade mag somewhere. God I'd do a lot more work if she didn't read that shit.

    Anyway, about halfway through the "Great Leap Forward" (as we [appropriatly] named it) the boss quit, and the next boss, who so far has been fairly clueful. He didn't think the whole deal was needed, but he was pressured by the former bosses husband (the CTO) to get it done. Seriously.

    Hope yours goes better than ours. From what we did, heres some tips I can give you.

    1. Be consistant through the whole thing.
    2. Make sure everything is planned before you start. This was the one part we got right.
    3. The team you have should have worked together before, because this sort of task requires previous knowledge of eachother.

    Other than that, my condolences. Or maybe it will work better for you.

    Good luck!

  5. Rule #1 by ackthpt · · Score: 2, Informative
    We're still in the middle of a sweeping change and lemme tell ya, make d@mn sure there's someone accountable for managing the whole project from beginning to end, particularly this being their main focus.

    Transitioning in new managers or having the current manager only look in on the project once in a while is as sure a path to madness and doom as no management at all.

    Our due date was mid-August, we'll be lucky to get it through testing and into production by January 31st. All the while with the logjam we're having to put pieces of it into production and cross our fingers that the new changes don't break anything.

    Love to talk more about it, but need another gallon of coffee.

    --

    A feeling of having made the same mistake before: Deja Foobar
  6. see joel on software by kubalaa · · Score: 4, Informative

    He says the exact same thing as you, only better.

    --

    "If you look 'round the table and can't tell who the sucker is, it's you." -- Quiz Show

  7. Compartmentalize & Destroy by DotComVictim · · Score: 5, Informative

    1) Identify common functionality.

    2) Encapsulate in libraries

    3) Be sure to extract enough generality that you don't have special case functions

    4) Don't extract so much generality that functional interfaces become unwieldy.

    5) Write everything in the same language.

    6) Find any complex pieces or algorithms. If they can be simplified or re-written, do it. If not, save it so you don't need to debug it again.

    7) Throw everything else away.

  8. modularity/incremental rollout/unit tests/iterate by tim_maroney · · Score: 5, Informative

    I second all that has been said about making sure that you really need to do this and that it is worth the time and risk. One sign that you may need to do so is an excessive reopened bug rate, where fixing one bug often creates another bug due to side effects and component interactions. If you decide that it is, then the three keys to success will be modularity, incremental rollout, and unit tests.

    Modularity is probably what you're already thinking about. Go over the old code base, in a code review, and find where the same thing is done over and over either with copy-and-paste code -- the bane of crap engineers -- or with different code that serves the same ends. Look for repeated sequences in particular. Create a new library that encapsulates those pieces of code.

    Incremental rollout is vital. Only replace small parts of your system at a time, doing complete retests frequently. Don't write a new encapsulated routine and then roll it out to each of the three dozen places in which it appears in the whole code base. Write the whole function library, with unit tests, and then start applying it to separable modules one by one, retesting as you go. Otherwise I guarantee the whole thing will fall apart and you won't be able to tell why. Ideally, you might set a threshold on the rate of replacement of old modules and work primarily on creating new modules with the abstracted logic.

    Unit tests are crucial because, as noted, the messiness of your old code probably conceals a lot of necessary logic. We had this great phenomenon on Apple's Copland where people who had never used the old OS managers were rewriting them in C or C++ from the assembly source. When they saw something in the assembly they didn't understand, they just ignored it. Guess what -- the new managers didn't have any backwards compatibility. The only answer to this is to have a thorough unit test for any module that you replace, against which you can test the new version. This also confers other quality benefits, but during a rewrite it's critical.

    Finally, once you have replaced a significant number of your modules, you will find that new levels of abstraction appear. The average size of each function or method will have shrunk considerably, and now it becomes possible to see new repeated code sequences that were not visible due to the old cruft. Move these into your new library modules and start using them in continuing replacement work. In addition, start going back -- slowly and incrementally -- through the already converted modules and replacing the repeated sequences with calls to the new abstractions.

    Finally, figure out how you got into this mess in the first place. The worst programmer habit I know of is copy-and-paste coding instead of using subroutines. You can tell people not to do it, but some always will. Those people should be bid farewell -- you can't afford their overhead. Other common problems include lack of planning and review, a code first and think later mentality. Start moving your organization up the levels of the CMM and you may find that you wind up with fewer modules that need replacement.

    Hope this helps.

    Tim

  9. Re:Don't Listen by Beryllium+Sphere(tm) · · Score: 2, Informative

    >- If you use your existing coding language you will literally fly through the retrofit. Do it piece by piece. Make all those changes first, then test app, then make next set of changes then test. The simple fact is, most wasted time is spent on bugs not working on performance, and you've already knocked down a lot of bugs, don't let them pop back up by blowing everything up. There are books on this.

    This is good advice. To be more specific:
    1. START with your regression test suite
    2. Then add self-documentation features like standard naming conventions. Seems dull and bureaucratic and pointless but really truly saves maintenance time.
    3. Have a standard comment header for each function. The standard should answer questions like "Can that argument be NULL?" and "What do the error returns mean?"
    4. If you're going through every line already, do a security audit.

    There's good advice in the refactoring books, for example http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp ?theisbn=0201485672&vm=

  10. Tablize It! by Tablizer · · Score: 1, Informative


    Put as much in possible into tables. That makes it easier to find, view, add new pages, menus, product categories, etc.

    Of course there will be exceptionss that need code instead of tables. Have special "override" points that allow you to override stuff as needed. Make the "overrides" their own function. (Example: sub foo(row); if row.clientID=7, then color3 = blue;.....). The override points are a lot like "on_x" IED-GUI or database triggers/events.

    If you store SQL in tables for a report generator for instance, there may be situations that tablized SQL cannot handle very well (or not worth adding new columns if only one or two instances differ from the majority). So make a routine to override each SQL section (SELECT, WHERE, ORDER BY, etc.), and another to override the whole SQL for the really complex stuff like correlated subqueries.

    I have successfully tablized HTML forms also. It is a lot of up-front work, but is better than coding each from scratch if you have hundreds.

  11. My suggestions by Pinball+Wizard · · Score: 5, Informative
    My programming team is considering making some sweeping changes to our code base (150+ perl CGIs, over a meg of code


    First of all, I think its important to realize that you have a medium-sized website and not a big software project. Therefore, some of the above comments recommending refactoring, UML, and eXtreme programming may be a bit overkill.


    Web programming != software development! Its usually done at a much faster pace. Even if an object-oriented approach is taken, you are still probably talking about simple function libraries rather than complex C++ or Java classes. Again, overkill.


    150 files is still a small enough project to be managed by one or two decent coders. Actually, I just looked at the amount of stuff I've written over the years for my online bookstore and its more like 500 files and over 4 megs of code. I don't feel like its too much of a job to manage this codebase by myself.


    So, here are my recommendations.


    You probably have gotten better at programming since the time you started your project. Take a few of the most recent CGIs you have written and compare them to the first ones you wrote. You just might notice a glaring difference in the quality. Also, the first pages you wrote are likely to be among the most important in your project, yet they are also likely the worst quality-wise.


    Regardless of what language you program in, I think its important that you can tell whats going on in the program by reading the comments. If a manager can understand what a program does by reading the English bit, there's a good chance other programmers will be able to jump in and help as well. One specific rule I also follow: if you do regexes, say IN ENGLISH what those regexes do. I say this because regexes are one of the hardest things to read.


    Look for any code that can be "factored out" of your scripts and put those into function libraries. Then include those in your program. The only problem with this occurs when you have huge function libraries that slow down your scripts when you include them. In that case you would logically separate your functions into different files. I have included very common functions in different include files, so I can make the actual code compiled or interpreted as small as possible.


    Consider using a flowcharting tool as an aid to programming and/or documenting your code.


    Standardize how you name variables and functions, write comments, identation, and spacing.


    Be sure and include the date you write your scripts in the comments, in case the filesystem wipes this out.


    I'm sure theres other things I've left out, but following the above guidelines have helped me do exactly what you are trying to do: manage a growing codebase. But don't forget, this is web programming, not rocket science, and some of the above suggestions may be more trouble than they are worth. Keep it simple.

    --

    No, Thursday's out. How about never - is never good for you?

  12. Perlmonks.org by consumer · · Score: 2, Informative
    You might want to reconsider perl as the language of choice for a large scale application. I realize I'm posting this comment to a Perl system, but Perl hangs together like an immense kludge of a language.

    What a monstrous Christmas troll you are. What qualifies you to make this judgement? Perl, like any other mature language, has people who write kludges with it and people who write clean, elegant code with it. Your lousy Perl code is not indicative of a language problem.

    That said, you're probably stuck with it, and AFAIK, you may be forging new paths in programming for reusability by applying the above concepts to Perl.

    And this shows how much you know, since the Perl community is full of activity around design patterns, refactoring tools, unit-testing, and other practices which are in favor among experienced people trying to write solid, maintainable code.

    My suggestion for those who are looking for actual useful advice rather than this kind of "throw away all your work and learn Java" crap, would be to head straight for http://perlmonks.org/ and read up. There's tons of advice there for serious Perl coders. You would also do well to start reading the mod_perl mailing list, which often has informative discussions about these issues.

  13. Having done complete redesign of dynamic sites by f00zbll · · Score: 2, Informative
    I've rebuilt dynamic sites from scratch twice. First and foremost, if you're rewriting because of serious scalability or design weakneses, then it is unavoidable. If it's just to get rid of annoying things, then I would say don't even try it. I consulted at a fairly big E-Commerce site that was crawling and couldn't handle the traffic. The original site was built by a programmer who scaled examples provided by MS. After it was done the whole site was a dog and would crash constantly. They finally brought in a programmer who was able to rewrite parts of it and make it work. After 7 months of intensive work, they 2 people stabilized the site. They decided to completely rewrite the site and I was contracted to help.

    In this particular case, it was necessary because the site was right at the max. If the traffic increased, it would kill the site. Since it was an E-Commerce site, rewriting it was fairly straight forward. The old code kept running, until we were able to finish the new system and make sure it was stable and ready.

    As a consultant, one of the most important aspects is detailed documentation that explains both the high and low level details. Often I will include very specific details about why a design was chosen and what limitations it has. When applicable, I will also describe how to extend, or modify the code to support additional features. This means you spend a lot of time doing documentation, but it forces you to think about a design more thoroughly and will expose weaknesses. Always keep an open mind and never fall in love with your design. There is no right way to build something, only right for the situation you are given.

  14. php project mngt app by gol64738 · · Score: 2, Informative

    a good project management application is important for any development team. usually, these are hard to come by unless you plunk down $10,000 or more, although these come with a gazillion features that you probably won't end up using.

    i discovered a new tool on sourceforge which is an open project written in php.

    i'm impressed with it. the code is also well documented.
    the homepage can be found here.
    i recommend checking out the screenshots as well.

  15. 99% Planning, 1% Coding by freality · · Score: 2, Informative

    When you first design and implement some module, a
    lot of time is involved in cycling between "ok, I
    know what to do" and "huh, maybe not". I've found
    this crucial, esp. in team work, in order to gain
    a good conception of the scope of the task. Also,
    many external issues, e.g. how the module interacts
    with the system, efficiency, etc. that aren't pure
    functional issues, are first grappled with here.

    Refactoring is different from this, in that you're
    probably very comfortable with the "state of mind"
    of the code. Instead of creating, you'll be
    clarifying. So, most of the refactoring is in
    your head (99%). All the external issues have been
    addressed before (or else this probably isn't really
    refactoring), so just work at a white board with
    your team until writing the code will basically
    be transcription (1%).

    I've found this to yield the best code.

  16. Re:Minimise Untested Documentation! by Arandir · · Score: 3, Informative

    I'll have to disagree. Document as much as possible, BUT NO MORE! But this documentation must be meaningful and relevant. Otherwise it is worse than useless.

    Document every function, listing the purpose of every parameter and the meaning of the return value. Document why you are doing something if there is more than one way to do it. If a section of code fixes a bug, document what it does and do not just document a bug number. Use self-documenting code whenever possible (ei. name your variables and functions meaningfully). Use a document generation tool if possible (javadoc, doxygen, etc). Write the user docs at the same time you're writing the code. Incorporate the user docs into the code if at all possible.

    Here is a bad comment: // store x2 in fu
    Here is a good comment: // save the index because we will use it later
    Here is a bad comment: // this is not meant for you to understand
    Here is a good comment: // please see Smith's "The Black Magic of Filesystems" for details on this algorithm

    The most important part of commenting is realizing who is going to read it. It may be you. But in all likelihood it will be someone you never met long after you have left the project or even the company. It may be code or design reviewers who don't know programming but do know how to block projects they can't understand. If it's Open Source, it may be some brilliant programmer wanting to fix a bug but without the time to puzzle over your constructs.

    In every code review I have ever been in, someone has made some silly assumption about the code, with the final recommendation that that section of code be commented better to avoid future silly assumptions.

    --
    A Government Is a Body of People, Usually Notably Ungoverned
  17. Re:Unnecessarily long variable names by Dwonis · · Score: 2, Informative
    The best code is properly indented

    I think this point needs clarifying: If a project is using one indentation style (this includes tabs vs spaces), for $DIETY's sake, use it!

    Also, if you are selecting the coding style style yourself, use K&R style. I don't care if you think it's not 100% logical -- it's readable, and every programmer worth his/her stuff is familiar with it. The only deviation that is really tolerable is one particular hack on function declarations (i.e. some projects use the hack so the regexp /^funcname/ will work, and this is usually OK with most people).

    Yes, I'm an asshole, but if you follow these rules, you'll find that your successors will curse your name much less.

  18. Re:Perl vs. PHP ? by tmoertel · · Score: 3, Informative
    Could I possibly ask for a fairly realistic little example of something that Perl does well that PHP can't?
    Middleware.

    For example, most enterprise applications do something important -- trade shares, manage accounts, track patient records, etc. How it's done is governed by business logic -- the company rules, policies, regulations, procedures, and so on. Now, you can spread this logic across your web site (as one might do using PHP, which is tied to the web site). Or you can bundle it up into an independent application, keeping all of the business logic in sensible, cohesive compartments that run on an application server (e.g., using one of the existing Perl app servers or one you've rolled yourself via, say, POE). This not only makes the business logic easier to understand and manage but also makes the logic independent of and accesssible from any number of front ends that you might need. Simultaneous web, client-server, and even command-line interfaces become possible, and for enterprise projects, multiple simultaneous interfaces is often a requirement for backward compatibility with older interfaces.

    In summary, for shallow web-only stuff, PHP is a reasonable tool. For stuff beyond web work, PHP is out of its design envelope. However, Perl works here just as well as it does for web work.

  19. Re:Minimize Untested Documentation! by wik · · Score: 2, Informative
    If you are going to use descriptive names (which I think is a fine idea), particularly like the one you mentioned, make sure the steps of the algorithm are clearly denoted. When one programmer sees an algorithm (or a maintainer looks it up in a book), they may be looking at the same algorithm, broken into different steps. There is nothing more frustrating than seeing the right thing (if you look in book A), labeled in a completely different way (because you learned it from book B).

    I think one of the biggest problems is that people believe that because they named something clearly, it must automatically be clear and logical to others. More generally than just naming steps of an algorithm, this is a problem with naming commands, functions, variables, etc. I think this paper makes a strong case http://citeseer.nj.nec.com/furnas87vocabulary.html for the naming problem (yes, this is even a problem with experts in a particular field). Names are great, they don't always stand on their own. I'd highly suggest that people read the short section on "armchair" naming. I haven't seen a programmer who wasn't tempted to use this at one point or another.

    --
    / \
    \ / ASCII ribbon campaign for peace
    x
    / \