When Making a Comprehensive Retrofit of your Code...
chizor asks: "My
programming team is considering making some sweeping changes to our
code base (150+ perl CGIs,
over a meg of code) in the interest of consistency and reducing
redundancy. We're going to have to make some hard decisions about
code style. What suggestions might readers have about tackling a
large-scale retrofit?" Once the
decision has been made for a sweeping rewrite of a project,
what can you do to make sure things go smoothly and you don't run
into any development snags...especially as things progress in the
development cycle?
we're doing something similar - and switching to java (JSP's + Tomcat, struts) to replace a lot of old perl cgi's. The java code is much, much cleaner. But object-oriented perl code can help if you don't want to take the plunge too far into a new language. And at least find a way to go mod_perl rather than CGI, for the things where performance matters at all.
Energy: time to change the picture.
Basically, Joel's take on a similar problem is: don't do it.
Unless you have a _really_ good reason to do huge change to a big codebase, don't bother, and make something more productive instead.
cheers
Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.
This is a lesson to be learned. Engineer your code from the beginning. Use easy to understand commenting, and strucutured code. Although it takes some discipline, you will almost never have to reconsider "re-writing from scratch".
My firm went through this sort of thing just two years ago. The PHB at the time decided, for some reason, that our 300,000 lines of semi-poorly written C code, and 50,000 associated lines of Java (Dont' ask).
Anyway, it took 7 of us over 2 months to get even halfway done. The pressure the boss was putting on us was awful, and he didn't really even understand what we were doing, even though he was the one demanding it. I think she read it in a trade mag somewhere. God I'd do a lot more work if she didn't read that shit.
Anyway, about halfway through the "Great Leap Forward" (as we [appropriatly] named it) the boss quit, and the next boss, who so far has been fairly clueful. He didn't think the whole deal was needed, but he was pressured by the former bosses husband (the CTO) to get it done. Seriously.
Hope yours goes better than ours. From what we did, heres some tips I can give you.
1. Be consistant through the whole thing.
2. Make sure everything is planned before you start. This was the one part we got right.
3. The team you have should have worked together before, because this sort of task requires previous knowledge of eachother.
Other than that, my condolences. Or maybe it will work better for you.
Good luck!
Transitioning in new managers or having the current manager only look in on the project once in a while is as sure a path to madness and doom as no management at all.
Our due date was mid-August, we'll be lucky to get it through testing and into production by January 31st. All the while with the logjam we're having to put pieces of it into production and cross our fingers that the new changes don't break anything.
Love to talk more about it, but need another gallon of coffee.
A feeling of having made the same mistake before: Deja Foobar
He says the exact same thing as you, only better.
"If you look 'round the table and can't tell who the sucker is, it's you." -- Quiz Show
1) Identify common functionality.
2) Encapsulate in libraries
3) Be sure to extract enough generality that you don't have special case functions
4) Don't extract so much generality that functional interfaces become unwieldy.
5) Write everything in the same language.
6) Find any complex pieces or algorithms. If they can be simplified or re-written, do it. If not, save it so you don't need to debug it again.
7) Throw everything else away.
I second all that has been said about making sure that you really need to do this and that it is worth the time and risk. One sign that you may need to do so is an excessive reopened bug rate, where fixing one bug often creates another bug due to side effects and component interactions. If you decide that it is, then the three keys to success will be modularity, incremental rollout, and unit tests.
Modularity is probably what you're already thinking about. Go over the old code base, in a code review, and find where the same thing is done over and over either with copy-and-paste code -- the bane of crap engineers -- or with different code that serves the same ends. Look for repeated sequences in particular. Create a new library that encapsulates those pieces of code.
Incremental rollout is vital. Only replace small parts of your system at a time, doing complete retests frequently. Don't write a new encapsulated routine and then roll it out to each of the three dozen places in which it appears in the whole code base. Write the whole function library, with unit tests, and then start applying it to separable modules one by one, retesting as you go. Otherwise I guarantee the whole thing will fall apart and you won't be able to tell why. Ideally, you might set a threshold on the rate of replacement of old modules and work primarily on creating new modules with the abstracted logic.
Unit tests are crucial because, as noted, the messiness of your old code probably conceals a lot of necessary logic. We had this great phenomenon on Apple's Copland where people who had never used the old OS managers were rewriting them in C or C++ from the assembly source. When they saw something in the assembly they didn't understand, they just ignored it. Guess what -- the new managers didn't have any backwards compatibility. The only answer to this is to have a thorough unit test for any module that you replace, against which you can test the new version. This also confers other quality benefits, but during a rewrite it's critical.
Finally, once you have replaced a significant number of your modules, you will find that new levels of abstraction appear. The average size of each function or method will have shrunk considerably, and now it becomes possible to see new repeated code sequences that were not visible due to the old cruft. Move these into your new library modules and start using them in continuing replacement work. In addition, start going back -- slowly and incrementally -- through the already converted modules and replacing the repeated sequences with calls to the new abstractions.
Finally, figure out how you got into this mess in the first place. The worst programmer habit I know of is copy-and-paste coding instead of using subroutines. You can tell people not to do it, but some always will. Those people should be bid farewell -- you can't afford their overhead. Other common problems include lack of planning and review, a code first and think later mentality. Start moving your organization up the levels of the CMM and you may find that you wind up with fewer modules that need replacement.
Hope this helps.
Tim
>- If you use your existing coding language you will literally fly through the retrofit. Do it piece by piece. Make all those changes first, then test app, then make next set of changes then test. The simple fact is, most wasted time is spent on bugs not working on performance, and you've already knocked down a lot of bugs, don't let them pop back up by blowing everything up. There are books on this.
p ?theisbn=0201485672&vm=
This is good advice. To be more specific:
1. START with your regression test suite
2. Then add self-documentation features like standard naming conventions. Seems dull and bureaucratic and pointless but really truly saves maintenance time.
3. Have a standard comment header for each function. The standard should answer questions like "Can that argument be NULL?" and "What do the error returns mean?"
4. If you're going through every line already, do a security audit.
There's good advice in the refactoring books, for example http://www1.fatbrain.com/asp/bookinfo/bookinfo.as
Put as much in possible into tables. That makes it easier to find, view, add new pages, menus, product categories, etc.
Of course there will be exceptionss that need code instead of tables. Have special "override" points that allow you to override stuff as needed. Make the "overrides" their own function. (Example: sub foo(row); if row.clientID=7, then color3 = blue;.....). The override points are a lot like "on_x" IED-GUI or database triggers/events.
If you store SQL in tables for a report generator for instance, there may be situations that tablized SQL cannot handle very well (or not worth adding new columns if only one or two instances differ from the majority). So make a routine to override each SQL section (SELECT, WHERE, ORDER BY, etc.), and another to override the whole SQL for the really complex stuff like correlated subqueries.
I have successfully tablized HTML forms also. It is a lot of up-front work, but is better than coding each from scratch if you have hundreds.
Table-ized A.I.
First of all, I think its important to realize that you have a medium-sized website and not a big software project. Therefore, some of the above comments recommending refactoring, UML, and eXtreme programming may be a bit overkill.
Web programming != software development! Its usually done at a much faster pace. Even if an object-oriented approach is taken, you are still probably talking about simple function libraries rather than complex C++ or Java classes. Again, overkill.
150 files is still a small enough project to be managed by one or two decent coders. Actually, I just looked at the amount of stuff I've written over the years for my online bookstore and its more like 500 files and over 4 megs of code. I don't feel like its too much of a job to manage this codebase by myself.
So, here are my recommendations.
You probably have gotten better at programming since the time you started your project. Take a few of the most recent CGIs you have written and compare them to the first ones you wrote. You just might notice a glaring difference in the quality. Also, the first pages you wrote are likely to be among the most important in your project, yet they are also likely the worst quality-wise.
Regardless of what language you program in, I think its important that you can tell whats going on in the program by reading the comments. If a manager can understand what a program does by reading the English bit, there's a good chance other programmers will be able to jump in and help as well. One specific rule I also follow: if you do regexes, say IN ENGLISH what those regexes do. I say this because regexes are one of the hardest things to read.
Look for any code that can be "factored out" of your scripts and put those into function libraries. Then include those in your program. The only problem with this occurs when you have huge function libraries that slow down your scripts when you include them. In that case you would logically separate your functions into different files. I have included very common functions in different include files, so I can make the actual code compiled or interpreted as small as possible.
Consider using a flowcharting tool as an aid to programming and/or documenting your code.
Standardize how you name variables and functions, write comments, identation, and spacing.
Be sure and include the date you write your scripts in the comments, in case the filesystem wipes this out.
I'm sure theres other things I've left out, but following the above guidelines have helped me do exactly what you are trying to do: manage a growing codebase. But don't forget, this is web programming, not rocket science, and some of the above suggestions may be more trouble than they are worth. Keep it simple.
No, Thursday's out. How about never - is never good for you?
What a monstrous Christmas troll you are. What qualifies you to make this judgement? Perl, like any other mature language, has people who write kludges with it and people who write clean, elegant code with it. Your lousy Perl code is not indicative of a language problem.
That said, you're probably stuck with it, and AFAIK, you may be forging new paths in programming for reusability by applying the above concepts to Perl.
And this shows how much you know, since the Perl community is full of activity around design patterns, refactoring tools, unit-testing, and other practices which are in favor among experienced people trying to write solid, maintainable code.
My suggestion for those who are looking for actual useful advice rather than this kind of "throw away all your work and learn Java" crap, would be to head straight for http://perlmonks.org/ and read up. There's tons of advice there for serious Perl coders. You would also do well to start reading the mod_perl mailing list, which often has informative discussions about these issues.
In this particular case, it was necessary because the site was right at the max. If the traffic increased, it would kill the site. Since it was an E-Commerce site, rewriting it was fairly straight forward. The old code kept running, until we were able to finish the new system and make sure it was stable and ready.
As a consultant, one of the most important aspects is detailed documentation that explains both the high and low level details. Often I will include very specific details about why a design was chosen and what limitations it has. When applicable, I will also describe how to extend, or modify the code to support additional features. This means you spend a lot of time doing documentation, but it forces you to think about a design more thoroughly and will expose weaknesses. Always keep an open mind and never fall in love with your design. There is no right way to build something, only right for the situation you are given.
a good project management application is important for any development team. usually, these are hard to come by unless you plunk down $10,000 or more, although these come with a gazillion features that you probably won't end up using.
i discovered a new tool on sourceforge which is an open project written in php.
i'm impressed with it. the code is also well documented.
the homepage can be found here.
i recommend checking out the screenshots as well.
When you first design and implement some module, a
lot of time is involved in cycling between "ok, I
know what to do" and "huh, maybe not". I've found
this crucial, esp. in team work, in order to gain
a good conception of the scope of the task. Also,
many external issues, e.g. how the module interacts
with the system, efficiency, etc. that aren't pure
functional issues, are first grappled with here.
Refactoring is different from this, in that you're
probably very comfortable with the "state of mind"
of the code. Instead of creating, you'll be
clarifying. So, most of the refactoring is in
your head (99%). All the external issues have been
addressed before (or else this probably isn't really
refactoring), so just work at a white board with
your team until writing the code will basically
be transcription (1%).
I've found this to yield the best code.
I'll have to disagree. Document as much as possible, BUT NO MORE! But this documentation must be meaningful and relevant. Otherwise it is worse than useless.
// store x2 in fu
// save the index because we will use it later
// this is not meant for you to understand
// please see Smith's "The Black Magic of Filesystems" for details on this algorithm
Document every function, listing the purpose of every parameter and the meaning of the return value. Document why you are doing something if there is more than one way to do it. If a section of code fixes a bug, document what it does and do not just document a bug number. Use self-documenting code whenever possible (ei. name your variables and functions meaningfully). Use a document generation tool if possible (javadoc, doxygen, etc). Write the user docs at the same time you're writing the code. Incorporate the user docs into the code if at all possible.
Here is a bad comment:
Here is a good comment:
Here is a bad comment:
Here is a good comment:
The most important part of commenting is realizing who is going to read it. It may be you. But in all likelihood it will be someone you never met long after you have left the project or even the company. It may be code or design reviewers who don't know programming but do know how to block projects they can't understand. If it's Open Source, it may be some brilliant programmer wanting to fix a bug but without the time to puzzle over your constructs.
In every code review I have ever been in, someone has made some silly assumption about the code, with the final recommendation that that section of code be commented better to avoid future silly assumptions.
A Government Is a Body of People, Usually Notably Ungoverned
I think this point needs clarifying: If a project is using one indentation style (this includes tabs vs spaces), for $DIETY's sake, use it!
Also, if you are selecting the coding style style yourself, use K&R style. I don't care if you think it's not 100% logical -- it's readable, and every programmer worth his/her stuff is familiar with it. The only deviation that is really tolerable is one particular hack on function declarations (i.e. some projects use the hack so the regexp /^funcname/ will work, and this is usually OK with most people).
Yes, I'm an asshole, but if you follow these rules, you'll find that your successors will curse your name much less.
For example, most enterprise applications do something important -- trade shares, manage accounts, track patient records, etc. How it's done is governed by business logic -- the company rules, policies, regulations, procedures, and so on. Now, you can spread this logic across your web site (as one might do using PHP, which is tied to the web site). Or you can bundle it up into an independent application, keeping all of the business logic in sensible, cohesive compartments that run on an application server (e.g., using one of the existing Perl app servers or one you've rolled yourself via, say, POE). This not only makes the business logic easier to understand and manage but also makes the logic independent of and accesssible from any number of front ends that you might need. Simultaneous web, client-server, and even command-line interfaces become possible, and for enterprise projects, multiple simultaneous interfaces is often a requirement for backward compatibility with older interfaces.
In summary, for shallow web-only stuff, PHP is a reasonable tool. For stuff beyond web work, PHP is out of its design envelope. However, Perl works here just as well as it does for web work.
Easy, automatic testing for Perl.
I think one of the biggest problems is that people believe that because they named something clearly, it must automatically be clear and logical to others. More generally than just naming steps of an algorithm, this is a problem with naming commands, functions, variables, etc. I think this paper makes a strong case http://citeseer.nj.nec.com/furnas87vocabulary.html for the naming problem (yes, this is even a problem with experts in a particular field). Names are great, they don't always stand on their own. I'd highly suggest that people read the short section on "armchair" naming. I haven't seen a programmer who wasn't tempted to use this at one point or another.
/ \
\ / ASCII ribbon campaign for peace
x
/ \