When Making a Comprehensive Retrofit of your Code...
chizor asks: "My
programming team is considering making some sweeping changes to our
code base (150+ perl CGIs,
over a meg of code) in the interest of consistency and reducing
redundancy. We're going to have to make some hard decisions about
code style. What suggestions might readers have about tackling a
large-scale retrofit?" Once the
decision has been made for a sweeping rewrite of a project,
what can you do to make sure things go smoothly and you don't run
into any development snags...especially as things progress in the
development cycle?
I'm sure a bunch of code nazis will disagree with me (please note the clever way I attempt to pre-emptively undermine their arguments by labeling them as 'nazis') but sometimes massive engineering re-writes are not necessary.
Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.
Most companies that decide to massively re-engineer their code (do a big rewrite) usually end up regretting it because it forces them to re-fix the problems that caused the original strange looking code in the first place.
Does your CGI nest work? If so, maybe you should leave it alone. If you are fixing specific problems, then go ahead, but if this is a generalized attempt to fix the 'not invented here' syndrome that plagues engineers (who will almost universally agree that it is easier to write code then it is to read it), perhaps you should reconsider.
Don't stop maintaining the old code code until the new code is on solid ground. No matter how sure you are that you can do it, the new code might never come through.
The Gardener
--
1) Know what you are doing. 2) Know what the code does.
Both are expedited by good documentation. This is so important, it deserves to be written thusly:
DOCUMENTATION!
Write everything down. If the code is not commented, figure out what it does and write it down. When you add a line or a module, write down what it is supposed to do. Declarations? Write those down too. Document everything so that you can figure everything out, both now and down the road when you decide to fix something else.
This is the voice of experience. I have had to reverse-engineer my own code 6 months after I wrote it because I failed to document anything. Learn from my mistake.
I have a strong belief in the Second Amendment.
A ton of people will tell you a ton of things, never having retrofited anything.
- Do not undervalue the investment you have learning your existing coding language. New challenges await you if you jump on a new language like java. Make the jump if you are excited about learning about the new language.
- If you use your existing coding language you will literally fly through the retrofit. Do it piece by piece. Make all those changes first, then test app, then make next set of changes then test. The simple fact is, most wasted time is spent on bugs not working on performance, and you've already knocked down a lot of bugs, don't let them pop back up by blowing everything up. There are books on this.
- Sometimes blowing everything up is worth it. Do it right this time. Realize it won't be as perfect as you might think it will be.
- Remember there are countless open source and shareware products that tried to create TNG with a total rewrite, got nowhere, and ended up improving their existing product. Remember the lesson, bite off what you can chew.
- Spend a week poking around researching possibilities. I do this all the time, bookmark things I think are important. Then for the next project you've got all the little things you might forget at your fingertips. Optimzations/Tools/Paradigms. Think you know it all? You'd be suprised at what is out there and what you missed. And what you spent a month in house re-inventing. This one's important.
- Use open source software. Nothing beats free. Nothing is more fun. Java's ugly standardization history makes me puke... the BS Sun has pulled with Java is staggering. That the Java Lobby swallows it and loves it even more so. This is irrelevant to your question, and not fair to the Lobby, but I like to give them a hard time.
- Colorary to Java. You need less abstract design then you think. Endless object hierarchies will weigh you and your app down... Their are books on this too.
- You need more documentation then you think. Ever found code someone ELSE wrote too EASY to follow. I don't think so. Especially if you are using perl and someone is enjoying the line noise capabalities perl allows. Perl has 20 ways to do EVERYTHING, you may not know the latest or twistiest. Document as you WRITE the code. Do not leave at the end of the day without catching up the docs. A week of documenting is the worst form of hell, avoided with a minutes worth of clarification each time you write a function/class.
- Hardware is cheap.
Anyways, have fun... and good luck. Be interested to read what others have to say.
Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.
Yes, and if they're cryptic and uncommented, they are worthless. Eventually one of these incomprehensible, magical fixes will stop working. Perhaps the bug it works around is fixed. Perhaps how the function is being used changes to previously unexpected behavior. Some poor engineer will look at the little big of magic, scratch his head, and be forced make a blind decision about how to fix it. Perhaps he can change the code while leaving the bit of magic in working, but he can't be certain, since he doens't understand it. If the collection of cryptic tweaks becomes dense enough, any attempt to fix a bug or add a feature becomes highly risky.
On a related note, don't let this happen to you. If you add one of these strange little fixes, for the sake of the programmer that follows you, document them. Just a little "Need to toggle the Foo because Qux 1.4 is correctly fails to do so" will bring tears of joy to the eyes of future programmers.
Large overhauls are usually mistakes. Details in the previous code are lost. If the overhaul takes non-trivial time, people become frustrated that two weeks ago they had a working (if problematic) system and today most of the system doesn't work.
Instead, make small incremental changes. Pick something lots of code is replicating and attempt to unify it into a shared code base. Spend some time documenting key parts of the code. Pick a particularlly hairy class or function and untangle some of the worst bits. These sorts of changes can reveal minor bugs, build up to significant improvements, and leave you satisfied at the end of the day that you improved things.
If a signficant overhaul is necessary, try to overhaul portions while maintaining the existing bits.
The first thing you definitly have to do is Setting up a test suit for regressoin testing.
For those not familair with the term 'regression test':
Program a set of so called "test drivers", programms calling your code: routines/scripts.
Define test data, either in a DB or in flat files, used by those driver programs.
The test programs and test data needs to work with the old code, of course.
As the new code should behave similar you only need to adjust PATH or script names to let the test programs work with the new code.
Plan your project by defining which test cases(test program plus test data) should work at a planned milestone with the changed code.
After making changes rerun all tests.
Well, there is a lot more you could do, but that above is minimum (basic software engineering, sorry no art involved here).
Regards,
angel'o'sphere
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Most of the computer industry now calls this Refactoring.
I would HIGHLY recommend the book "Refactoring" by Martin Fowler.
There is a number of things you should do here.
1. Document your plan and come up with an official PROPOSAL document. Allow others to comment on this document and incorporate fix all relevant issues.
I started using this under the Apache Jetspeed project and now a lot of other Apache projects are accepting this practice.
It really allows the community to become involved in your changes and encourages constructive feedback and involvement.
2. Break this into phases. You should NOT attempt to do this all at once. Each phase should be isolated and should consist of one unit of work.
Each phase should be branched off of CVS, worked on, stabilized, brought back into HEAD and tagged. You should then RUN this code in a semi-deployment role for a period of time to correct all issues which WILL arise with the updated code.
After this you can then start your next phase.
3. UNIT TESTS! If management (assuming you have management) has approved the time for this type of refactor then you need to take the time and write Unit Tests for each major component.
It is important that Unit Testing can sometimes be just as hard, if not harder, than the actual development itself.
In some situations you can avoid Unit Testing, some here are going to call me crazy for saying this but it is true. In a lot of high level applications, which are NOT used as libraries by other applications, you can bypass Unit Testing in order to increase development time. This is a dangerous practice but it is often outweighed by the extra functionality you will end up with in your product.
Anyway. Good luck!
Kevin
Refactoring is not re-writing. Refactoring is a methodical approach to moving, redesigning, and making the code cleaner. It's not throwing away and starting from scratch. The two words are wholly semantically different.
In his book, Refactoring, Martin Fowler talks about how code "smells". He identifies a whole slew of reasons why code can "smell" bad, such as having a method that's too long, having a method on the wrong object, incorrect use of polymorphism, etc. He then outlines "Refactoring Patterns" which can be applied to certain "smells" to make the code more managable.
The last project I managed was a 3 month re-write of over 75,000 lines of code for an auction component that had gotten way out of control. The original component integrated with a very very poorly written (yet still very expensive) auction engine, and our task was to take it out.
Because there was no time allotted to proper requirements gathering, the powers that be decided that we could use the existing system as a requirements base, and just make the new system "act" like the old one.
So I was thrown two things: A staff that was inexperienced in design, and a lack of requirements. We were re-writing from scratch -- throwing all of the code away and starting again. I'm one of the only ones on my team that actually knows what refactoring means, let alone how to apply it.
So we built the new system based on a solid design. We even did the design in UML. But that only goes so far. Given a properly designed system, there is still another lower step -- how to actually implement the code. Invariably, the design doesn't take into account helper methods that are necessary, other objects, etc. Because we were on a tight schedule, I left it up to each developer to design these new tidbits.
If they had known what they were doing, everything would have been fine, but there are several places where the code is absolutely abysmal. It's not that it will perform poorly or is totally unmanagable, but in some places it's close.
Granted, the business rules for the system are some of the most complex I've ever seen, but if the developers had known more about refactoring, they might have made better choices.
There's a whole notion of Quality here that should be discussed as well. Most good has a good enough quality, and though it looks ugly, I'd throw it away. There's an interesting interview by Joel (a former microsoft programmer) which tackles these problems of Quality head on. I also reccommend reading Zen and the Art of Motorcycle Maintenence: An Inqiry into Values for a pretty good discussion of the whole notion of quality.
I'd say just keep the codebase unless you are totally switching languages (or in our case taking out a 3rd party integration). Otherwise if you re-write you'll wish you hadn't. The new codebase will end up being in the same state as the old codebase, it'll just take a little bit more time to get there.
I'd also reccommend just doing a performance test of the site, and find out which modules really need to be re-written. The 80/20 rule will invariably apply -- 80% of the execution time will be spent in 20% of the code -- so at a minimum just re-write the 20% of the code that's executed on the most frequent paths. Leave the rest alone. Re-writing it will just be a waste of time, and you have to fix all the old bugs that were re-introduced during the re-write.