How To Deal With 200k Lines of Spaghetti Code
An anonymous reader writes "An article at Ars recaps a discussion from Stack Exchange about a software engineer who had the misfortune to inherit 200k lines of 'spaghetti code' cobbled together over the course of 10-20 years. A lengthy and detailed response walks through how best to proceed at development triage in the face of limited time and developer-power. From the article: 'Rigidity is (often) good. This is a controversial opinion, as rigidity is often seen as a force working against you. It's true for some phases of some projects. But once you see it as a structural support, a framework that takes away the guesswork, it greatly reduces the amount of wasted time and effort. Make it work for you, not against you. Rigidity = Process / Procedure. Software development needs good processes and procedures for exactly the same reasons that chemical plants or factories have manuals, procedures, drills, and emergency guidelines: preventing bad outcomes, increasing predictability, maximizing productivity... Rigidity comes in moderation, though!'"
no comment...
I advise rigidly farming it out.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
Wouldn't that be Linux? It seems to work fine for me.
If something has become spaghetti over 10-20 years, then no one cared that it became spaghetti over 10-20 years. And it will still be spaghetti over the next 10-20 years. Fixing something like this requires a commitment from management, which means money. If the management of the project aren't convinced that cleaning up the development process is worth the initial investment for the long term, then they choose to deal with the constantly higher costs forever.
Something like this makes me think that this is one of those problems that get pushed off for someone else to deal with later. And the next person perpetuates this, by doing the same.
Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
Rewrite from scratch.
Outsource it to India!
... it are usually the bad and inexperienced programmers who upon taking over a project condemn the existing
code to be a mess and wanting to do a rewrite.
Is it broken? Why does he need to fix it?
I'd rather be homeless than be in charge of a project my boss doesn't really care about. Talk about the fast track to nowhere. Even if by some maricle you do pull it together, nobody will know or care.
Rewrite it from scratch using the spaghetti code version to run correctness tests to verify you haven't changed the behaviour.
200k lines is about how large the Doom codebase was, and it wasn't uncommon for John Carmack to rewrite most of his game engine in a couple of weeks, a week, or even a weekend when he felt it wasn't going on a good path.
I knew I read this before:
http://programmers.stackexchange.com/questions/155488/ive-inherited-200k-lines-of-spaghetti-code-what-now
That article is linked in the first sentence of the summary.
Well step 1 would be to lose the attitude.
It's code, it may be in an obsolete language, it may be not to the best industry standards, but its code and it's got enough knowledge in it, that nobody wants to throw it away, and they hired you to maintain it.
Step 2, I don't know why you would define a process before you understand the code you are to apply the process too?
Seriously, wtf is all the process stuff about, you're the sole programmer, any rules you set are rods to break your back when you first hit a piece of code you have to break the rules.
Step 3, you serve them. If you want to port it to a more modern maintainable language, choose one that's easy for THEM to transition to. They've got the knowledge that drives the company, not you, you are the cleaner here. If the phone rings your turn off your vacuum and let them do their job, Mr Cleaner. Nobody gives a fork if the cleaner has best industry procedure for cleaning an office.
Step 4, break it down. tiny bit by tiny bit, port to a new CLEAN structure, bit by bit. They wrote it, they can identify the core stuff.
Step 5, once you've ported it, along comes an engineer with a code written to the old language and old methods. Again, that's fine, put away the process manual, these are the experts, if that's the language he can communicate to you in, it's fine, you can understand it, you can port it, you can help him speak the modern lingo. Don't quote your processes to him, you're just the cleaner.
As for this:
"Software development needs good processes and procedures for exactly the same reasons that chemical plants or factories have manuals"
That's someone who *implements* things, typically a bolt together module manager. He is not someone who creates *new* things. Because news things don't come with manuals. You don't know the rules of how they work till the problems needed to make them work are solved. One assembles Microsoft IIS blocks, the other works for Google on image processing. Which are you?
How To Deal With 200k Lines of Spaghetti Code
Well some good spaghetti sauce and a nice wine.
Look at what the software is supposed to do and what it does not do at the moment. Fix this first and after that document the main functions and start replacing them one by one in an orderly fashion and document them this time. It will take time but at the end you 'll have eaten the spaghetti and your project is saved. The biggest problem in software usually is that there is no time to do it right but there is always time to do it over again.
Use basic AI procedures. Remember that simulations of the code will be necessary for the re-write. Also, the result might be too complex for humans to understand. That's no worry, just have a sandbox ready where you can simulate stuff.
To a ravioli coder procedural code will often look like spaghetti.
All the advice to rewrite it is misguided. Maybe rewrite small parts that you need to to keep it working on new hardware, or whatever, but if it works, I would think that wholesale rewriting is asking for trouble. The Ars article is full of great advice about what you should do to manage a large codebase going forward, but actually it doesn't really address the question of what to do about a large legacy codebase that wasn't written with best practice. The best software is written by incremental improvement of what went before (no matter how badly written, as long as it meets its specification) - big projects written from scratch usually fail.
Try projects of "enterprise class" that span into the MILLIONS of lines of code.
* I'm serious here...
Yes, there's a BIG difference between walking into a system with 200k lines (this is REWRITE territory still imo, & it's a professional one with over 17++ yrs. & around 30 projects of the MILLIONS OF LINES sized code for information systems)
vs.
One with millions of lines in it (not so simple to just rewrite).
APK
P.S.=> There's a time to rewrite, and a time not to, and when you're sub 500,000 lines, a rewrite IS possible and if it is as "bad" as this article *tries* to make it out as, then you rewrite when it's THAT small (& yes, 200k lines is TINY)...
I've done my assessments here based on doing smaller stuff from:
---
1.) The shareware/freeware level (most of mine here maxed out at around 5,000 - 10,000 lines of code tops)
2.) Commercially sold code in utilities I am part of in THAT market (usually midrange size 10,000 lines - 100,000 lines)
3.) Finally, and yes, "Enterprise Class/Industrial Strength/Mission Critical" information systems (these had millions in front end code alone, toss on SQL Stored Procs & more? You get the picture!)
---
The latter's where I've made the largest part of my livelyhood professionally since 1994 ( these definitely can often be quite large into that MILLIONS of lines of code range, with MANY "moving parts" in libs, main code, SQL stored procs, and tables/devices on DB's galore + more)...apk
Take bits of the code and rewrite them one by one. Tackle a function here and a function there and over the course of a few years you'll have 400k lines of spaghetti code since modern coders don't need to worry as much about memory limits and processing constraints.
Personally I found the book 'Working Effectively with Legacy Code' (http://www.amazon.co.uk/Working-Effectively-Legacy-Robert-Martin/dp/0131177052) offers some great suggestions about how to integrate new functionality and changes into legacy systems.
Make sequence diagrams based directly off the code.
This will make your life so much easier.
Then have a separate project that you rewrite.
If you aren't being paid to do this and you just want to do this because it is awful to work with, spend maybe 2-5% of your time a day doing it.
Even do it at home if you wish to. (this is more of a case of if you live and breath programming as to whether you would want to or not. I would)
But make sure that you let your boss know how great a job you done since most likely that recode will boost the speed and efficiency several times over when you replace the old file(s).
That will likely get you a big fat bonus or possibly even a raise since you never went and asked for any resources to do it in the first place. Forward thinking people are always loved in business. (unless your boss is a really terrible, horrible person, which most actually surprisingly aren't when it comes to this, most are just the horrible person part)
Obviously make sure to do your usual project coding stuff as well, documentation is key here since the entire thing became a mess likely because of poor documentation
The problem now is the others, which the first post on the SO link describes how to deal with very well. (really well in fact. I like that guy)
It is going to be long and hard, but there is a very good opportunity in it for you if you handle it right.
Good luck.
I have spent most of my career as a software developer inheriting and updating such spaghetti code bases. Here are few remarks and some of my experiences around this:
In summary, don't be too scared of a legacy spaghetti code base. These things can be understood well enough in time to refactor or port to a new platform.
The trouble with really bad spaghetti code is that, if you change anything, something else will break in an unpredictable way.
The good thing is that you don't have to re-write the code in its original language. If you can use Python, the resulting code will be much smaller (as much as 90% smaller).
You can also use tools (Simulink, Matlab) to avoid actually writing code. Some industries are standardizing around model-based design. It's a good way to visualize, create and document complex code.
If the spaghetti code is a maintenance problem, it is costing the company money. You can make a strong economic argument for re-writing it. If the code isn't causing a maintenance problem, leave it alone.
In the above, I have assumed that the code is truly bad. If it isn't that bad, you should refactor. link link
http://developers.slashdot.org/comments.pl?sid=3026933&cid=40885035
* I'd look it over, make an estimate of the time needed to get its process/information flows down + a rewrite, vs. sticking a customer with a cobbled-together wreck that rubberbands, paperclips, & superglue are holding together (for now).
APK
P.S.=> I average around 400,000 lines of code over 10 - 12 month projects since 1994 on mid-to-large scale information systems (up to "industrial size/enterprise class/mission critical systems into the millions of lines of code)!
Taking into account you have to get the database schema, information flows, and process down as well, I'd say what's needed is a rewrite here... & it is around 6++ months work, tops!
Yes - and, of course, money talk$, & it depends on what the client wants & is WILLING TO PAY FOR too.
Sometimes, those "cob job" maintenence jobs (yes, plenty of those here too professionally in info. systems/db work both as fresh or rewrite &/or maintenance jobs too) are all they can afford, so you are STUCK with it and after oh, 10 diff. guys have worked on it, you get those "spaghetti deals" (which suck to understand even at times) because of money & timeframes... apk
See subject-line, and, SO true (on possibility of breakage when doing partial maintenance of code only vs. rewrite from scratch when YOU get full understanding AND full control).
APK
Open documentation? Pfffft. That's a lame advice. No no, what you want to do is mix it up some more, obfuscate the code, change variables to make them global and reuse them everywhere but all for different purposes. Merge large files together and brake small files into even smaller ones.
Remove any useful comments from the code, but add plenty of comments like this:
// adds 1 to i, waits until i is greater than 10 then adds 2 to a.
Now that's a comment!
Then leave the project and see the other guy come in and pull his hair out. Life is hard, make it funny.
MY OTHER COMMENTS
The really frightening tought is that there are many 40 year old first year CS students.
Why?
It's not like they're wearing a Speedo in class or anything.
I was hired as a reviewer of a project most members think it's not as bad as it is. Some are professional programmers and the code they write isn't too bad. But not abyssmally bad isn't enough. They started to write code a second time unknowingly as the code is a mess and they don't know about the code. They use the mailing list to compensate missing documentation. They use classes, modules, packages, but the API is sometimes missing or - even worse - plain wrong. The biggest problem - at least some of them think they know what they are doing...
Per John Lakos, almost any bad interface can be wrapped in a good one. Slice off an appropriate slice of dodgy code, wrap it in a testable interface, write the test code to baseline its behaviour, and then when it makes business sense, you can refactor that slice of code. If it doesn't make business sense, you don't touch it.
The difference between software development and most manufacturing is that they produce the same or very similar product thousands or millions of times where we produce a different product every time. This "building an app is like mass producing a chemical" philosophy is one reason why most software shops have insane amounts of unneeded documentation and overhead. I certainly agree that some standards, processes and documentation is needed but it should be kept to the bare essentials as every bit of work done that doesn't directly build a product could well just be a waste of time.
Rewrite it from scratch using the spaghetti code version to run correctness tests to verify you haven't changed the behaviour.
And just how are you supposed to write "tests for correctness" when the very concept of what is "correct" is embedded in the code?
Any such tests would embody your own notions of what is correct based on your understanding of a codebase that cannot be understood.
Furthermore, Doom is quite a different thing. You have an end result that can be somewhat different and it doesn't matter - it could render textures such that they appear rather different but if you find it visually OK then it's fine.
No such luck with business software which usually has extremely rigid and exactly output, often output other systems are depending on being just so. There is no room for alteration of behavior, yet as I said no-where exclaims all of the features of the output you cannot possibly understand....
I agree with a few responses that the only way to proceed is to re-write tiny parts, that at most affect one other system in the company - with the explicit buy-in from those other groups something may change, and the understanding you may have to back out your changes wholesale if you cause too much disruption.
Can't get buy in to proceed? Then quit or work with the code as is.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
First, rewriting from scratch is obviously a foolish suggestion. Only somebody completely separated from goals-based performance and the economy of the application would suggest such a move.
If it is 200K lines of code, it obviously has a fairly wide range of functionality so just tackle it in pieces, rewriting the sections towards a common spec or employing a framework. You'd simply have to allow for extra time for each project, where the development team would simply combine their iterative development of the section being expanded/improved with migration from spaghetti to common form. Where it crossed with other spaghetti, a compromise or facade connection would suffice until everything was brought into line.
Interface should be the last worry, as a) a spaghetti application is a much more serious long-term drag on any ongoing development, b) the users are already used to the existing interface and c) changing interface means retraining people. Better to build a facade between the existing interface and improved code in the interim, than to have to begin dual-training people on multiple interfaces.
With a consistent approach, the development team would claw back so much extra time from standardization that those early, extra 'migration' hours would be quickly wiped off the slate.
Everybody's code is spaghetti except your own. That is until the next guy see's your code...
And I thought CS was trying to be more 'relevant'.
That's why step tracing of ANY code (spaghetti or not) for a rewrite especially, matters (or even when maintenancing a mess).
You need to get the programmatic "flow" down, either way. Then, if you're shy of a business analyst? You have to do the legwork on checking the information flows needed (which means investigating db design/schema, and what comes from WHERE in tables in the DB Devices).
A LOT OF LEGWORK COMES IN WHEN YOU'RE "NEW" TO ANY PROJECT (especially in info. systems work).
APK
P.S.=> IF it's a "single module" (possible), then you do what I stated above!
Which is basically what the poster I replied to here whom you are replying to also, in breaking down said single drop-down inline code (dumb many times, especially for maintainability) that's "spaghetti" into manageable parts/logic engines!
(Be that functions OR just subroutines/procedures inlined in the SAME module, busted OUT of a single monolithic "driver" main module).
You then even can separate OUT those methods/functions (or subs/procs, whatever you call them, clarifying what puts out return types (functions) vs. straight 'modules' (procs/subs)) into separate:
---
1.) UNITS (e.g. Pascal UNITS, C/C++ .h headers, or Basic modules ala VB) if need be
OR
2.) Even further, into instanceable objects (any OOP languages can do that)
3.) OR EVEN FURTHER, into custom controls (ala .OCX from VB or MSVC++), or LIBS/DLLS (OLE type or std. type) out of it, and other languages like Delphi or C++ of varying vendor type))
---
Sorry for responding for someone else, but I wager THAT's what he meant... he knows what he's up to, I can tell, as it "takes one to know one"...
... apk
Instead of multiplying how long you think something will take you by 2 when you give your estimate(since everybody underestimates how long it takes to do something) multiply it by 4. (Since if the code is that bad you'll need the extra time to A:Find it and B:Find out the first way you fixed a problem broke something else because the code is garbage.) Can you tell I'm working on spaghetti code now?
Did you know 80 to 90% of the moderators on slashdot wouldn't recognize a troll even if one dragged them under a bridge.
.
A BIG BALL OF MUD is haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle. We’ve all seen them. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.
Still, this approach endures and thrives. Why is this architecture so popular? Is it as bad as it seems, or might it serve as a way-station on the road to more enduring, elegant artifacts? What forces drive good programmers to build ugly systems? Can we avoid this? Should we? How can we make such systems better?
We present the following seven patterns:.....
Nuke it from orbit.
It's the only way to be sure.
I agree with the idea of using (and the necessity) of having analysts familiar with the legacy coding environment writing business specifications for the existing code (and as well the dependencies between modules) and having multiple developers port it to a new language as a way of moving the code base to something more maintainable. Again, I am assuming that the porting developers are using source code management tools, following a standard, and are adhering to a common design philosophy (coding for exceptions, using a consistent naming standard, comments that address the what of the code (not the how), and indenting code for readability) .
I disagree with the idea that 200k lines not that much. I'd estimate that a better than average developer would be able to read and port between 100 and 200 lines per day, so this project would take between 1000 and 2000 developer days to port. Even factoring in having 5 resources means that this project would take 1 to 2 years at a US cost of $600K to $1200K.
I would not bother with the rewrite part, I'd focus on identifying the business requirement along with the associated code functionality. The code is maintainable once you know what it is supposed to do and where it does it. I'd also require any changes to be documented to the newer standard. In order to avoid making the problem any worse, I'd put in a SCM, and controls around releases to the production environment.
Your employer probably isn't interested in spending the time and money on a re-write. Nor are the clients going to be interested in waiting that long for new features, either.
You will be made to figure it out and add features, or you will be shown the door.
Anyone who tells you to rewrite this from scratch should not be working with software. There are no shortcuts. You have 200k lines you don't fully understand. If you try to replace this with a system written from scratch so that you don't have to understand each and every part of the current system, you will fail. It will take years and still never reach a releasable state. Sooner or later you or the guys paying you will lose patience and either scrap the incomplete and misguided rewrite (good) or release it out of fatigue (bad). That is the kind of thing that kills a company.
On the other hand 200k isn't that big. Live with it for a while until you understand its basic layout, have successfully fixed some bugs and added some features. Then you can start gradually improving parts of it; taking care to understand the code well before you touch it.
The fact that you do not immediately understand this makes it highly likely that you are not good enough to do this work yet. Find another job where you have less freedom to break things and try to gain some experience.
captcha: rational
// adds 1 to i, waits until i is greater than 10 then adds 2 to a.
Now that's a comment!
Didn't they teach you that comments which re-state exactly what the code does is bad? Here's how that comment should look like:
Everyone who wants to know the details can refer to the code. The comment shall not give the what (10) but the why (large enough).
SCNR :-)
The Tao of math: The numbers you can count are not the real numbers.
just put it all inside a class
now you have nice object-oriented code
...make sure you have a good handle on what's already there.
I've walked into situations where the application could only be successfully build on the lead developer's machine, and he had no idea why. I've seen teams that were incapable of going through a release cycle without losing some of the new code, and only realizing it when the QA department couldn't get the system to run, at which point the developers had to reinvent what they had lost through sloppiness and laziness.
First, make sure that you have implemented some sort of version control (even if it is just making regular ZIP files of the source tree until you can arrange something better).
Second, make sure you can start with a clean machine, load the source tree and development tools and successfully build the application. Until you can reproduce the production version of the software exactly, you don't have it under control.
Then, you can start worrying about bring order to chaos. I wouldn't try to get anyone to "mend their ways" regarding poor software writing practices right away. I would worry about getting a process in place (write code, check in code, have build machine check out code and build it) to make sure that code is not lost, and that everyone has access to the same code.
In the past years I have been several times in such a predicament. Huge amount of code and the function of the system is not completely clear. The original developers are gone, the system isn't well documented and only a handful of people know how how it should behave. As a matter of fact, tomorrow I will start coding on one system we can no longer support as hardware, OS and used libraries and frameworks are outdated and/or discontinued.
Reengineering and rewriting is usually the best option. However, you need skills and experience in order not to make the same mistake the previous developer did. Of course, management must trust and approve your actions.
A few dos:
* Learn at least UML use cases, components diagrams and sequence diagrams.
* Make use cases and check these with affected parties.
* Start of with a rough component model of the new system.
* Make a clear picture which nodes (hardware + OS), subsystems (units performing a function), software components (modules containing data, modules performing a function, etc...) and agents (users, triggers/schedulers) are involved.
* Draw the interactions between the subsystems and/or software components.
* Clearly document which interactions are on-line and which ones are batch/background/off-line.
* Specify interfaces. (Used file formats, protocols, software library interfaces if you will.)
* Slowly refine your model until you feel comfortable with it.
* Make a rough class model and keep usability and maintainability in mind. Backtrack if necessary.
* Divide software components between "dumb" containers of information (e.g. plain Java beans) and components performing functions (business logic if you wish.)
* Decide which interfaces to make public and which not.
* Describe restricted/private bits of code just enough for maintainers to understand them. And nothing more than that.
* Make as much unit as necessary for your components. Unit test enough functionality.
* Communicate your results regularly and refine your model where applicable.
* Define integration tests and do these very seriously.
* Define regression tests and perform these very seriously.
* Make involved parties accept parts of the system according to performed integration and regression tests.
* Try to plan gradual decommissioning of the legacy system.
* Document the system "enough". System architecture (from UML), references from architecture to code, installation manual and operational manual are the most important ones.
* Try to achieve longevity in the documentation. Abstract details and convince involved people that that is a good thing.
* Define 1st, 2nd and 3rd level support. Preferably you should remain 3rd level support to better enjoy sleep.
* Conform to standards and practices if they reduce discussion and enhance clarity.
* Use well established techniques. E.g. JPA and JAXB.
* Allow well established component manufacturers to make your programming life easier. E.g. Apache Commons.
* Be tidy.
A few don'ts:
* Avoid OO pattern overkill.
* Don't take the quick and dirty option too quickly. Those decisions will haunt you eventually.
* Avoid making everything public. Documenting and maintaining public interfaces is more expensive.
* Try to avoid big bangs.
* Avoid less well established component manufacturers. My next project did use components from less established component manufacturers and their sell by date has generously expired.
* Don't allow babling "architects" to make a mess of your system. But don't alienate them either.
I may have forgotten a few things but this is all stuff I consider even for smaller projects.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
In large code bases, it can be rare that everything follows the same naming conventions, indenting style, or even programming style, simply because hundreds of people work on the code, and different teams with different skills edit different parts of it.
The only unit where this tends to be true is the file.
I've dealt with a number of legacy code bases, done poorly. Some of the worst examples were done by veteran procedural programmers who picked up an OO language via tutorials.
The worst of their evils could be greatly reduced by all tutorial and book authors, by stating, promiently, the rule of thumb that if the module they are typing goes past one screen in height that they should start thinking about breaking it into another module.
Almost every language has a module of some kind ( procedure, function, subroutine, etc ).
How much are you willing to spend to analyze the problem and construct a plan to salvage what is left!?
You may want to reread GP.
In this vein, though, I can also suggest a worthy strategy: comments that allege the code has certain side effects.
/*Once i is 10, frob a in order to
override the default masking behavior
used by the converse invocation of
this module*/
This will imbue the maintainer with a fear of refactoring. Is the code live or dead? Are there other side effects that aren't apparent? Is this a vestigial artifact of a previous refactoring?
I will admit that this is an advanced tactic that works best in degenerate projects that lack adequate automated testing and SCM.
The article is excellent. It's publishing guidelines very similar to those I use, and my colleagues use, when dealing with our business partner's accumulated software projects, and covers it very well: I intend to use it as a checklist for spaghetti code integration projects.
I'd emphasize the switch to good source control management (SCM). too many workgroups have undocumented workflows, and benefit profoundly from switching to or learning to properly use a robust system. This process also helps identify who is the primary spaghetti code author. In such a project, there is often a particular developer or "architect" who has been carrying around the functional map of this project in their head. Their time and commitment to get that map into documentation so others can work with it is absolutely necessary to such projects, SCM is often the first step towards this. And if that core developer or architect doesn't agree with the project, _fire them as fast as possible_, because they will hamstring every effort to move to any cleaned up architecture.
Also note that most such core developers or architects have actually been wanting to do something like this cleanup for _years_, and simply haven't been allocated the time and resources to do it. There are risks: repairs are often much cheaper and safer than the kind of large scale this kind of cleanup represents, and the actual benefits have to be presented to the managers or clients to get them to invest manpower, so that core developer often has a huge emotional frustration with the original code. Working with them to get their buy-in, and being willing to trade minor points of disagreement with them to get their cooperation with big issues are priceless on such large projects. Otherwise, they can and often do backstab every integration effort as "wasted time" or "not sufficient", even claiming both at the same time.
Reminds me of how as a young lad, I went to work for a jewelry company's "IT Department". I put it in quotes because the department consisted of me, a supervisor who couldn't code, and a department head that had no idea whatsoever as to anything that happened on the company computers. Someones nephew or son that got put on the payroll is my guess.
Anyhow, they had about three billion lines of applications written in Dibol, all spaghetti code, no documentation, last time they had employed anyone who could code was about 2 years prior, so everyone just worked around the bugs in the applications.
I worked there for about 7 months, scrubbing the code and then going home and writing the documentation on the code. Documenting was never anything they had asked for or specified in my goals/objectives. After 7 months when I'd fully documented everything on my own time and the system was working fairly well, they decided that things were going pretty well and they wanted another family member on the payroll, so they let me go. Six months later, the wheels came off of something because one of the monkeys pushed the wrong buttons. I offered to sell them the documentation I'd written and refer them to some other programmers who would then be able to fairly quickly and cheaply fix what they broke. Or they could hire someone full time and maybe in 6-9 months they might fix it.
Didn't have to buy my girlfriend any jewelry that year. Swapping boxes of 3 ring binders for a bag of random jewelry in a parking lot at night is still one of my fondest memories.
See kids, do your documentation and eat your vegetables...they're good for you!
Rewriting from scratch is probably the worst thing you can do. See this article by Joel Spolsky,,
...richie - It is a good day to code.
with 150K Lines of 'Yeah, it's done and tested' code.
The project was almost done by a long term contractor who suddenly left.
I joined the team and started doing code reviews. This probably had a lot to do with his decision to leave.
The modiles were almost commentless (bangs head against wall) and full of if then elseif then elseif then elseif etc etc.(bangs head even more)
Sorting it all out was a salutory lesson to everyone especially the PHB's. It took a lot of time and dedication to fix the damage. Sadly we had to fix it as there wasn't time to start from scratch. Thankfully the old team leader who was supposed to manage the contractor left before I arrived.
You just have to get on with fixing the mess in whatever way is best for your situation. Don't dwell on the failures that led to the situation happening in the first place.
A Voodoo Doll in the Effegy of the person to blame with plenty of pins at hand really helps though. Then when It's all over and done you can cremate the doll banishing the bad spirits to the flames down below.
I've been called in to work on a number of software projects over the years, some of which I was dismayed to find were bloated monstrosities. I refuse to leave a mess behind, even if it's not my mess... and if I don't know how long I'll have to work on that code, the sooner I start cleaning the better. Here's my strategy, more or less:
But for pity's sake, don't just leave the mess as a mess.
Koans and fables for the software engineer
Working Effectively with Legacy Code http://www.amazon.co.uk/Working-Effectively-Legacy-Code-ebook/dp/B005OYHF0A/ref=sr_1_1?ie=UTF8&qid=1344190021&sr=8-1
I stopped reading shortly after "I am aggressive when it comes to coding conventions"
The mention of Agile as a positive strategy and the volumes dedicated to "format" of code are only useful to scratch an itch. They are only valuable in the mind of the author and those who think like him and rarely have any effect on outcomes.
Ridigity * is not a strategy in and of itself. It is the preference of someone who is anally retentive and change adverse.
The wins from his scheme are like the wins from modern language features. They can only go soo far to reinforce or mitigate actual outcomes in a project lifecycle.
What makes or breaks large scale projects is not a vigorous environment but a creative one.
One that encourages creative solutions to effectivly understand and manage global complexity. Everything else is as TFA says "noise".
I would much rather pay someone to write code that "looks" like shit if means they are spending more time on the big picture than pay someone who is obsessive compulsive with no time or capacity left to reason about WTF is over the horizon.
There is no substitute for thought. If procedure and process were that important machines would be writing all of our software for us by now.
Its not 200k Lines, more on the order of 10-20kLines (depending onf the count; its written in a highly compact language). It is not my main task to restructure it, as a matter of fact i have rediculously little time budget for it, give the current state of the code. My task is to integrate new features into the code. However when i looked at it an rewrite seemed inevitable.
Let me break dow how i try it:
a) Analyze why the problem is there. There are two aspects of it: Is there a fundamental problem with the qulification of the team members (in my case, there is - they are not programmers, but experts on other topics). The other question: Is there something inherently wrong with the processes (there was. Two parts of the teams uses the Version control system based on the assumption that its only purpose is to snapshot their "working" state - which contributed hugley to commits mingling all kinds of feature updates).
b) If you look at the feature which you should implement, what is the ration of the work it *should* take (in my case: not more than 2h) to the ratio it *would* take in the current structure (in my case: 20h). Analyze what is the worst point for this feature (in my case: not separating certain layers of reading/converting/validating input and not having any explicit delcaration of a certain data structure).
c) what can you do? In my case: rewrite just this part in a better way (not perfect), with the following criteria: use the same or less time for the feature you should implement, includign the conversion. Demonstrate the power of the approach to you co-workers by integrating them in the process. In my case i used roughly 12h for rewriting the procedure, 6h to test it against ther old code, 2 h for sitting down with my boss/project manager and explainign it. After this he could include the changes he wanted himself easily in a negliglible time. (yep, i made myself obsolete for this task, and that was highly appreciated because i am not very cheap to hire)
d) discuss a clear strategy how to upgrae the code, piece by piece to a decent level, and make some showcase where infrastructure improvement would help, in parallel to what you did up to know. That is very important, since the willingness to support the conversion a new structure depends on progressively showing advantages and clearly demontrating progress. Real artists ship (i work as a consultant). If we miss a deadline in this project that would be *bad*.
e) explain what you are doing in a manager-compliant way. A lesson in communication traning you get as a consultant is *never ever* speak negativly about any product or service in general. It could be well that the head of department would ask me why i see certain steps necessary. My answer would be quite general, like: separate expert knowledge from implementation. Or: make it easier to maintain and *save* work in the long term. Be careful in that context with comments about the code quality. There is no *bad* working code. While you wish to say: this code in incredibly incoherent, taped together work of twenty different trainees supervised by sombody who did not know about the system himself, say the following: I think we can imporve the code by intrioducing a database backend. I believe that a more unified way of describing the inputs to the code will save the time of [increbly good technical expert who no spend 25% his time huntig obscure bugs]. If you go down the other road and mock the code, the following things can happen: a) the project gets cancelled, because management believes its beyond repair b) the mangemer does not hire you because he was the one who started the project 10 years ago in an obcure form which you know about c) you will loose the support of co-workers for mocking them and face a harsh review should you rewrite no be the flying pig which you promised.
ok thats my 2 cents.
200,000 measly lines of code?
Having done a lot of code maintenance - including Y2K certification of the "MMDF" code base (first comments/headers would have negative unix timestamps) - he needs to start by learning about code beautifiers and finding a style he finds easy to read.
Then, personally, I try to storyline the code. Some times, more creatively than others but those are extreme cases.
-- A change is as good as a reboot.
Prepare three envelopes.
The program itself is a physical model of a complex chemical processing plant; the team that wrote it has incredibly deep domain knowledge but little or no formal training in programming fundamentals.
This is a model of a physical system. Why not raise the level of abstraction and find the state machines from the code documenting the discovery, hopefully at least semi-automatically, to a UML modelling tool which supports the XMI?
If your new code does what the old code did when both are fed the same input, you're good to go.
I am curious just how long the project you propose will take to complete given that you need to produce an infinite combination of inputs to succeed.
Your basic idea is not bad, but it's simply impossible to apply to the whole system at once. That's why I suggested a small piece that interfaces with at most one other system, so you can in practice limit inputs and have somone else really tell you what is flawed in your output, because you will have no idea. You will still forget many bits of key input that lead to dramatic errors, but hopefully after a year or two most of that will be ironed out without too many people fired.
Your concept also requires the system to be able to be run wholly in isolation to run input through, something nearly impossible to do for many IT systems, ESPECIALLY the spaghetti kind. If tey couldn't write good code why do you think they would have made it easy to test?
With an ancient and bad code base there are NO aspects that are in your favor or will help in any way.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
... code and save it for the next management luncheon. Serve it up with marinara sauce. If the managers get sick, then toss it. If they don't, then give them the leftovers.
--
How much of this post is literal and how much is metaphor is left as an exercise to the reader...
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
If nobody knows if the original program is functioning correctly then nobody knows if the replacement is functioning incorrectly.
Another person who has never developed software in a company I see.
You'll find ZERO people that are willing to confirm your output is correct.
However you'll find hundreds eager and willing to find flaws in your output. Sometimes even if they have to make it up, or sometimes even if they just aren't sure. For any question raised you must be prepared to prove that any questioned output is correct. Any misstep means the re-write is canceled.
Often (and I am not kidding here) it is easier to simply start a new company and use better software from the outset.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
If the 200k lines are utterly horrible, a high-quality rewrite can probably be done in less than a year, with modularity and unit testing kept in mind you can make a high-quality project out of it.
none of these general principles people are putting forward for this case are that critical, G2 is weird, don't start out assuming everything's broken and needs fixed. Some bad experience happened, but they apparently fixed it and are still productive with the system, be cautious and learn the system. Best advice I can give on this is WORK WITH THE VENDOR of G2, they can probably help the guy get up to speed. The point is, they've got a working good system that has been evolving for 20 years or so, dont think you're going to just rewrite the damn thing. There may be real good reasons he couldn't possibly know yet. Spend a year studying the system, then worry about about improvements. If the want you to run some classes on best practices while he's learning, that's cool. I also don't think he's "inherited" anything, the employers are not going to let the guy launch off on a crusade to fix what is working well for them already. Humility is appropriate here.
Yeah, it was posted on ArsTechnica yesterday too (kinda like a professional, adult version of Slashdot).
But it's OK, I still stick to Slashdot because I enjoy reading the trolls for some strange reason.
Good software evolves.
You need to forget all grandiose plans to fix the software, especially if the ideas come from academia. Become unprincipled with respect to practicality.
The software has already evolved a lot. If you attempt to recreate from scratch, you will almost certainly repeat previous errors, even if you avoid the current maintenance mess.
The problem is that the software has evolved towards external goodness, while ignoring maintainability goodness.
Make a secret plan to slowly evolve the software towards maintainability.
Slowly sneak maintainability and sound software practice in the back door when no one is looking.
Scream loudly about how big the mess is and how amazing it is that anyone can make any changes at all to it.
Make the point that it is not how well the elephant dances, but the fact that the elephant can be made to dance at all. This will give you more time.
'Tis still my favorite dish. When someone comes in behind you and "cleans up" your code, you often lose a bit of functionality... If not entirely. I agree things need to be cleaned up from time to time, but I've run into some problems like that. And nah, I don't think I'm Linus Torvalds or anything. I've done some tinkering throughout the years though.
I feel his pain. I'm currently dealing with over 5,000 badly written, undocumented Java classes. Just because it's object oriented doesn't mean it isn't spaghetti.
I'm old enough to remember when discussions on Slashdot were well informed.
I am sure that when developers look at our code in 20 years from now. They will refer to it as Spaghetti code. Especially when they work with a lot of the convoluted implementations of MVC inspired frameworks. They'll wonder what the !@#$% were we thinking?
They'll wonder why for a simple query they have to create four new templates and modify a 1/2 dozen more. They're wonder why there is no reference guide to link to these. And how they're stuck following a trail of breadcrumbs. They'll be flabbergasted to think that we thought we were somehow decoupling things when in fact we greatly increased the interdependence of files.
Yes, in 20-30 years, our great decoupled object-oriented programming will likely be the spaghetti code of it's day and be viewed as archaic as the use of GoTo logic.
***
And unlike procedural spaghetti code in which one could often follow the logic procedurally. OOP spaghettic code often is so decoupled that it is extremely hard to follow or discover what and where something may be called from if you do not have some sort of road map.
And let's be honest, do you really think our usually poorly documented roadmaps for our OOP apps will be existent for the developers in 20+ years?
Do you really think there was no documentation whatsoever for all these applications written in COBOl six decades ago? Back when it was almost standard to first write a PSEUDO code version of an app.
No, we just tend to be more arrogant because we're "uptime"...
1632
The first things to do are the following:
1. Assess the present stability of code? Is it for the most part functioning? If there are a handful of critical errors. Address those, but do not get bogged down.
2. Assess the core requirements. When I worked on a medical practice management application we had tons of requirements. Many of which we were told were ABSOLUTELY CRITICAL to the practice. A year later we'd find out said feature had not even been used once.
Determine what features/actions are core....re-write these in new code with a critical eye to performance and stability.
3. Run concurrent systems, transition 90% of core tasks to the new re-written, more extendable and adaptable code. Realize there will be some compromise on perfection in order to inter-operate.
Leave uncommon tasks to the old antiquated system. Sure it sucks for a user to have to log into the old system 3-4 times a year to run some quarterly task or procedure. But it's something most users can deal with, and will do so with only minimal griping. Especially if the core tasks are now functioning at an improved level.
4. Road map features and slowly add new feature requests or migrate old features to the new app as is required by user need. Why map an old feature that's used 1-2 times a year if a new feature is needed everyday. Why add a new featured used 1-2 a year if an old one that is used weekly can be migrated instead.
5. Enjoy the fact that you're not unemployed and homeless. Yes, the guy who posted up above exclaimed he'd rather be homeless. But I wager most people are smart enough to know that undesired homelessness without a support system sucks. And would prefer NOT to be there.
Don't start on a 200k spaghetti code base until you have satisfactory answers to these questions: #1 - Is the code under code management such as svn, git, or cvs? #2 - Do you have a proper bug and new feature reporting system?. #3 - Does anyone know how are changes/deliverables managed? Is the sales force out there just selling willy nilly? (Don't laugh, please, I've seen it.) #4 - Does anyone do Q/A? #5 - Are you being given a raise for this new responsibility? #6 - What possible career advancing goal do you meet by picking this up? Most likely its working with dead code, on a dead ship on a dead sea. #7 - Does your manager want you to deliver at the same rate as your predecessor? #8 - You talk to your boss about hiring someone and restructuring the code base, but he says there's no money or head count. He just wants you to 'fix' it up. #9 - Do you have to meet once a week with some marketing fool and explain why you haven't met their unrealistic milestones? #10 - Did the last guy go because he had a heart attack? If so, do you want one?
200k lines is big, but not impossible. The real problem is what some subject matter expert coded 12 years ago, who left 8 years ago, and no one's looked at since, because everyone's scared of it.
What a lot of folks in the comments I've skimmed utterly ignore is that this is a "complex chemical processing plant". If slashdotters code crashes, oh, well, they get yelled at, and it gets fixed.
With a chemical plant, not quite the same. Think of the disaster in Bhopal, India , or the last major oil refinery fire.
The way I'd deal with it is this:
0) Identify one, or preferably more than one, subject matter expert in each area of the plant
covered by the software.
1) it's 10-20 yr old *spaghetti* code. Document it. ->Do flowcharts-. (And if you kids look down
on them, that's because you don't understand a toolbox larger than 1 hammer and 1 screwdriver,
and if I were hiring you, you'd be as junior as it gets). *THEN* you have some idea of what's
happening, in what order.
2) Bring in the subject matter experts for a working meeting, with very high level diagrams, and let
them figure out what section of the code, and process, they know about.
3) Set up meetings with individual subgroups, and get lower level flows - the code should relate to
the process in the plant, which presumably starts at one end, and various things come out
at various points
4) Identify where stuff jumps the line, and whether it actually does that, or whether that's a major
problem.
5) And pick a common language that's stable, going to still be common in 20 years, has a *lot* of
folks making a living in it now... and one that's close enough to G2 that the people you have
to work with, who'll probably be there long after you're gone, will be able to transistion to
easily, without a lot of resistance. Since you're saying Pascal w/ graphics, I'd suggest an
older language - C, or C++ - I promise you'll have a lot of problems with Java, and as for
current fad languages.... I do think you'll really, really want a *compiled* language, for
something like this, not a scripting language.
6) Now you're down to normal architecting: : des
Exactly. If at all possible, build a test suite of data that exercises the old program, then make sure the newer versions give identical answers and if possible, generate random data as well to find logic that does not exist except by chance. Don't know how you could do this if your are talking about a GUI program, I am thinking of engineering type problems.
"There is no god but allah" - well, they got it half right.
Many people will tell you not to upgrade old applications. It's not worth the trouble and it's much easier to just write a new one, right?
Wrong! (usually)
No one likes to endure the "learning frustration" of figuring out someone else's code. particularly several someone elses. But there is knowledge in that old code that no one else knows, and functionality that no one knows about. Except your customers, who will complain vociferously!
Writing a brand new Application will cause the loss of functionality. It will very likely cause the loss of operability, because there are things that you don't know, without which the app will not work.
In the last year, I have seen this happen to two separate hardware manufacturers, whose "smart" products we have to use. I have also had it happen to me once, a long time ago before I learned better. It has also caused more than one company to go out of business!
Be warned. The hard way is often the easy way, in the end.
I once worked for a company that had code that had started being written in the 1960's that they had continued to build on that was all spaghetti. In the late 1990's they decided to un-spaghetti it. At one stage I was given four programs (all spaghetti, no documentation) to work out what they did and rewrite them. They came to 10,000 lines of code. I managed to do it in 2 weeks. (This doesn't include the code review by three other programmers or the testing by the testing department to have them sign off and agree it all worked. It didn't require any changes or rewrites, so was good code). So, someone working at the same pace could theoretically complete the task in 40 weeks. Of course, this doesn't take into account the complexity / obfuscation of the code. But 200k lines of spaghetti at it's worst, maybe a year and half for a lone programmer to complete (taking into account program code reviews and testing etc).
Sure enough, the cow costume was hanging up next to the superhero outfit and sailors uniform. (S,Spud)
If it was my task, I would first make certain that I had a copy of all the source and that if at all possible, using a test system, recreate the production executables.
You may have to do this while "maintaining" some of the known critical code via bug fixes". Until you have a working test system that matches the production system, you will never know if the code you have to maintain matches the production, is used, or was never deleted because it was a "Just in case... copy".
Thereafter, were I doing it, I would implement some change management procedures. Any change request has to be in form of a request, with a justification. This cm process will help you get a handle on the business priorities. I can email a cm form.
If you can get a college student or intern to help you out, go for it. Your job is going to need help, and a project of this size is just right for a one semester project.
As you put source code together that are related into separate directories, (you are organizing the sources), your task may suddenly not appear as bad as you thought. Do not think of the coding, but concentrate on the business processes, and most certainly, visit the end-users to find out when their subsystem was implemented. Try to match that with source dates or comments within the sources. Organize your directory names for the business processes,
Please note, you cannot do it all in a day. It will take about 16-20 weeks of dedicated work to complete the cataloging and getting a proper handle on the business application.
Best of luck.
Leslie Satenstein Montreal Quebec Canada