Slashdot Mirror


Learning and Maintaining a Large Inherited Codebase?

An anonymous reader writes "A couple of times in my career, I've inherited a fairly large (30-40 thousand lines) collection of code. The original authors knew it because they wrote it; I didn't, and I don't. I spend a huge amount of time finding the right place to make a change, far more than I do changing anything. How would you learn such a big hunk of code? And how discouraged should I be that I can't seem to 'get' this code as well as the original developers?"

532 comments

  1. 30 to 40 thousand lines isn't large by any measure by Anonymous Coward · · Score: 0, Informative

    Yes it's still a bitch to maintain it. But 30k to 40k is by no means large.

  2. Time by wmbetts · · Score: 5, Interesting

    If you don't have access to the original developers and they didn't document it you're going to just have to spend a lot of time reading the code. =\

    --
    "Ubuntu" -- an African word, meaning "Slackware is too hard for me". - stolen from Dan C alt.os.linux.slackware
    1. Re:Time by dintech · · Score: 1

      You might want to come up with a few good reasons (other than just the ones you stated above) for doing a clean-room re-write of the damn thing. This might give you a chance to give the users something better than they already have or that interfaces better with other systems in your enterprise. It's a long shot but doing the requirements gathering and developing it yourself might be more fun than just learning it through reading. Good luck!

    2. Re:Time by Anonymous Coward · · Score: 5, Insightful

      Everyone, including me, always wants to go for the clean rewrite. But in my experience it almost never turns out for the best. There's a reason for all that messy code. Much of it was bug fixes that real-world users needed. Other complexities were needed in the first place to make the user experience simple (natural, giving it that "hey, it's just works like I expected" feeling).

      The reason you don't understand the code is that you weren't part of the original design discussions, in which weeks or months were spent learning, debating, arguing, etc., about many different design decisions at many different levels of abstraction. You don't know why the trade-offs were made. You just see the finished product.

      Rewriting the code won't give you insight into any of this. Learning the code the hard way, fixing bugs, rewriting *small* pieces and seeing what breaks the regression tests, etc. will eventually help you to understand it.

      There is no point in rewriting it before you fully understand it. Attempting that can kill a product. Conversely, by the time you fully understand it, there won't be any need to rewrite it, because you'll own the code.

    3. Re:Time by Anonymous Coward · · Score: 0

      Often times, a bad idea.

      If it's anything like stuff I've worked on, the code has man-years of implicit knowledge, and bugfixes. A rewrite that doesn't take all of that into account will have regressed functionality.

    4. Re:Time by Anonymous Coward · · Score: 0

      And give up years and years of bugfixes and user knowledge?

    5. Re:Time by Anonymous Coward · · Score: 1, Informative

      Anonymous Coward rarely gets any mod points, no matter how good his/her commentary is, which is really annoying to me.

      This commentary is a Guru level advice but no mod points, and yet there will be mod points for all sorts of fluffy comments.

    6. Re:Time by TheLink · · Score: 1

      I don't see any evidence that the original developers were grossly incompetent. There is no evidence that the rewrites will be easier to understand and maintain.

      I have rewritten stuff and made things much better (far more scalable), but just because the program is hard for you to understand at first doesn't mean it needs to be rewritten.

      If you see lots of "DailyWTF" candidates in the code, while that means a rewrite MIGHT[1] be a good idea, you usually still need to read the crap to figure out what the code needs to do - because in those cases there's often no nice "requirements doc" and nobody really knows what is needed...

      But all he's complaining about is he doesn't understand it and doesn't know where to change stuff. If I were his boss and that's the only reason he gives for a rewrite, I'd tell him to learn the code first till he can tell me what it does right, what it does wrong and should do instead. If he keeps coming in and goes "Wahh it's too hard, we need to rewrite", I'll be tempted to replace him and not the code.

      Yes, understanding and fixing the code may feel like work. But that's why people get paid to do stuff they find unpleasant.

      [1] Sometimes unfortunately there is no time for a rewrite from scratch, so you rewrite bit by bit of crap code to make it less crap while knowing the design and architecture is still crap. And other times you have to do both - rewrite bits of the old crap, while rewriting the new (crap in different ways) replacement for the old crap from scratch.

      --
    7. Re:Time by Anonymous Coward · · Score: 0

      Which is precisely the reason I develop all code using well-delineated functions which makes the source code easier to follow. I use descriptive function names and variable names as well. If I cannot flowchart the structure of my code it means it has been poorly designed.

    8. Re:Time by Anonymous Coward · · Score: 0

      Oh god, no. Please don't tell me you just suggested to him to toss out the old code and rewrite everything from scratch.

    9. Re:Time by syousef · · Score: 1

      He needs to document and comment as he goes. Otherwise others will need to repeat the process.

      The best documentation is inline. Unfortunately our editing environments don't allow for diagrams to be embedded. That's where references to external docs are needed (but they can be part of the code repository!) It takes a lot longer to document code. Unfortunately managers often don't want to hear that and will think you're just slow picking up the code, so get them on board or warn them that others will need to do the same.

      --
      These posts express my own personal views, not those of my employer
    10. Re:Time by jhol13 · · Score: 0

      No. Don't read the code, it is mostly waste of time.

      Make an improvement somewhere. You have to know how that part works and you'll learn a lot from that. Put on *tracing* (or add it, if it does not have, or use debugger/profiler to get it). Read through the trace log, see what it is doing.

      After a few months 50kloc should be pretty familiar to you.

    11. Re:Time by Bender0x7D1 · · Score: 1

      See sig.

      --
      Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.
    12. Re:Time by Anonymous Coward · · Score: 0

      Not always an option. Awhile back, I was handed an application with a pile of associated libraries, object-oriented in name only, none of it commented or documented, and told to make it do things it was never designed to do. When I asked if I could refactor parts of it to make it more flexible and sustainable, I was laughed out of the room: the old, "We-Have-A-Deadline-And-Don't-Have-Time-To-Do-The-Right-Longterm-Thing" bit.

      So, in my case, for every change I make, I usually have to sit down with a pad of paper, and take notes about the code I'm going to change, and the code around it, so I can get an idea of what it's doing and how the various pieces are supposed to interact. It's sort of a slog, but I find that as I work on bits I need to change, I build up my mental picture of the code.

      Oh, and comment/document liberally. Not only for future generations, but writing down what the code is doing will help you understand the big picture.

    13. Re:Time by AlXtreme · · Score: 1

      Everyone, including me, always wants to go for the clean rewrite. But in my experience it almost never turns out for the best. There's a reason for all that messy code. Much of it was bug fixes that real-world users needed.

      Even so, the chances are that the original developers were grossly incompetent.

      It really depends on how messy the code is and what you want to do with it. You only need to change a couple of things and you don't see how the code will be used a couple of years down the road? Of course you shouldn't rewrite the thing.

      If on the other hand it was hacked together by a bunch of monkeys and it's your day-job for the next couple of years don't torture yourself by maintaining the beast indefinitely. Rewrite module-by-module and keep anything that you can maintain as-is. Perhaps you'll discover while rewriting a module why it was put together the way it is and if that reasoning was valid simply let it be.

      In my experience you'll know within 5 minutes of looking at the code/database if you'll need a rewrite. All of the projects that I've given a complete overhaul come out lean, mean and maintainable (and actually working, which makes the client happy). Definitely a blast of fresh air compared to the putrid mess they were beforehand. But perhaps I'm just a sucker for taking on globs of unmaintainable code, I don't know.

      It really depends on the situation and the time on your hands.

      --
      This sig is intentionally left blank
    14. Re:Time by TheRaven64 · · Score: 1

      I don't see any evidence that the original developers were grossly incompetent.

      The original developers of almost anything were grossly incompetent. This is even more likely to be true when the original developers are the current maintainers 5-10 years ago.

      --
      I am TheRaven on Soylent News
    15. Re:Time by mswhippingboy · · Score: 1
      I just completed project to add functionality to an old (15+years) Pro*C application(~ 50K loc). I was able to convince management to let me rewrite it in java rather than hack the already buggy codebase. The result was very satisfying. I was able to rewrite it in the same time that was originally allocated for the enhancement. The new code has not had a hiccup in over 3 months (it runs 24/7/365) as opposed to the old code that was a major headache for the support group.

      This is how I did it:

      First, I simply ported the C code directly to java, as much as possible, keeping the logic, variable names and structure the same as the original "C" application. Once that was done, it tested and tested until the results were identical to the original application (minus the crashes an bugs).

      Then, I when through a refactoring of the application to restructure it from a procedural to an object oriented application, all the while, continuing to regression test to make sure nothing got broke in the process. Once I had the new code properly structured and performing well, I added some exception handling to increase the reliability, automatic recovery and support notification capability that were non-existent in the original application. At this point, I basically had a new application functionally equivalent to the old one, with better structure, in a newer technology and far more reliable than the original.

      Finally, I added the new functionality that drove the original project scope. This was fairly easy given that I was very comfortable with the application structure by this time and the application was far more extensible that the original application structure.

      Point being, there are cases (not all, but many more than you can convince management of) where a rewrite is the best approach both from a project cost as well as the quality of end product perspective.

      --
      Sometimes the light at the end of the tunnel is the headlight of an oncoming train.
    16. Re:Time by mswhippingboy · · Score: 1

      There's a reason for all that messy code. Much of it was bug fixes that real-world users needed.

      More likely much of it is patches to the application where the temp developers that were brought in didn't understand the app, so the just hacked it until it worked. After many iterations of this, the code is nearly incomprehensible. Every project is different, but there are a lot more cases where a rewrite is appropriate than it's possible to convince management of.

      --
      Sometimes the light at the end of the tunnel is the headlight of an oncoming train.
    17. Re:Time by jthill · · Score: 1

      Two observations:

      • If the coded model doesn't behave correctly, patches that just compensate for individual consequences will look like "all that messy code" in short order, and the flood will not end until the users stop coming.
      • Tolerating api/ui/protocol changes during a complete rewrite is stark staring insane.

      That said,

      There is no point in rewriting it before you fully understand it. Attempting that can kill a product.

      True.

      But the rewrite tradeoff hinges on how much better the new model behaves than the old one when producing identical results on all its test cases. Your metric is the cogency of the code. A well done needed can turn a full-time maintenance team into one guy who has to spend maybe twenty hours a year on it.

      There are, of course, managers and programmers who regard that as a a bad result.

      --
      As always, all IMO. Insert "I think" everywhere grammatically possible.
    18. Re:Time by ztransform · · Score: 2, Insightful

      There is no point in rewriting it before you fully understand it.

      I fully support this statement.

      I recently worked with a guy new to contracting. He came onboard to a project that had a lot of problems. He argued for re-writing it thinking he could do it quickly and simply; I didn't dispute that the system could use significant changes, and I asked him to read through and understand the existing code.

      He never did.

      Consequently I suggested to senior managers that he should be let go. Reading other people's code, particularly undocumented code, is painful - even for experienced coders. But it is necessary and failure to do so before recommending changes is unprofessional, dangerous, and lazy.

    19. Re:Time by Anonymous Coward · · Score: 0

      There's a reason for all that messy code. Much of it was bug fixes that real-world users needed.

      Other times, the reason it needs mess and bug fixes to run the way users want, is that there was no well-thought-out design to start with.

      The reason you don't understand the code is that you weren't part of the original design discussions, in which weeks or months were spent learning, debating, arguing, etc., about many different design decisions at many different levels of abstraction. You don't know why the trade-offs were made.

      You are making a big assumption. Lots of legacy products "just happened", without meetings. Just clients or bosses telling one coder after another "make it do this", to which they responded by doing what they understand "this" to be using whatever edits seemed like a good idea at the time. This might be one such product.

    20. Re:Time by CHJacobsen · · Score: 1

      While i agree with you overall, there is a flipside to this. The original design, while mature, might have been created in a different context. Typically, as user-requirements change, the architecture gets littered with hacks and workarounds, and the further it moves from the original specification, the harder it is to maintain.

      Thus, the programmer preference of rewriting from scratch now and then might actually be quite healthy. Like you said, though, a rewrite should wait until you actually understand a product properly. It must not be an excuse to avoid studying the existing code properly.

    21. Re:Time by ScrewMaster · · Score: 1

      It really depends on the situation and the time on your hands.

      For the average project inheritor, the situation is almost always dire, and the amount of time available is usually pretty close to zero. Nothing but good management will ever change that. Consequently I'm not holding my breath.

      --
      The higher the technology, the sharper that two-edged sword.
    22. Re:Time by Anonymous Coward · · Score: 0

      I've actually enjoyed the process of trying to understand a large code base. I don't think of it as a competition but as a mystery.

      Methods I've used:

      1) Try to understand the principles behind the code, if you can get a handle on the architectural principles behind the code, you will be able to understand a given piece of code much more quickly.

      1) Skim the code as a whole trying to understand the overall architecture involved. Don't try to understand everything about a particular piece of code till you've looked at the whole thing.

      2) Notice any constructions that seem unusual and *look through version control* trying to see the decisions that went into creating them. Assuming you have version control, it is a good source for

      3) Follow the path of information between functions.

      4) Look for any bits of information that are globally visible (of course global variables but also global constants). These tend to be a source of bugs but also a way that information spreads across the program

    23. Re:Time by azmodean+1 · · Score: 1
      The main thing I would look at when determining whether to do a full re-write of the code rather than performing targeted re-factoring, is not the code itself, but the overall design. If the design is sound but badly implemented, you should be able to gradually refactor problem spots. If, on the other hand, the design itself is suspect or just plain wrong, it may be a candidate for a complete re-write (at this point you have to also look at business aspects of the process, such as how much change is expected to be made to this code in the future, and what kind of time constraints you have.)

      A pair of examples from my immediate experience:

      Two different dataloading protocols, layered on top of two different file transfer protocols, (furthermore each of them is layered on top of a different network protocol, but that is outside the scope of my project.) Each module was originally split across two applications running on seperate machines, the dataloading protocols were handled by one application on one machine, and the file transfer protocols were implemented on a second application running on a second machine (Don't ask, the project architecture is... interesting). The modules were tied together by a (really horrible) in-house message passing library. At some point lost in the mists of time, the programs were merged, this was done by making a new application that spawned both of the other programs as threads, but they were still tied together by the horrible message-passing library. After basically reverse-engineering a significant portion of the design and doing some performance analysis, I went to my higher-ups and presented my case for a re-write of one of the modules but not the other. The reasons were:

      1. Module A had lots of bugs outstanding whereas module B was reasonably stable.

      2. Module A had (has) significant features still to be added, whereas Module B was and is feature-complete.

      3. Module A was at least somewhat modularized between the dataloading and file transfer protocols, which allowed re-writing one first, then the other, Module B's protocol layers were hopelessly intertwined, basically requiring a complete re-write of both layers at once.

      1 and 2 were the main reasons, but 3 definitely is an issue. I'm still planning on trying to refactor Module B in-place to remove the terrible message-passing crap and replace it with a state machine, but that is extremely low priority.

    24. Re:Time by happyjack27 · · Score: 1

      if he were to give you this answer: what it does right: nothing. what it does wrong: everything. would you still tell him to rewrite it? would you just think of him as cocky? and when you actually have to look at the code yourself, would you end up rewriting it after insulting him for suggesting exactly that?

  3. A good starting point by RCL · · Score: 3, Interesting

    Try to single-step it in debugger from the beginning up to main loop.

    1. Re:A good starting point by robot256 · · Score: 3, Insightful

      I didn't get this one until I switched to my alter ego, the assembler programmer.

    2. Re:A good starting point by Anonymous Coward · · Score: 2, Interesting

      I think this is the fastest way to find the right place to make a change. Stepping the application through a debugger is probably faster than reading through the code to learn how things are done.

    3. Re:A good starting point by RCL · · Score: 1

      Agreed. Let the code reveal itself.

    4. Re:A good starting point by timnbron · · Score: 1

      My first job used a fairly low level language (CORAL) with about 250000 lines of code. I didn't really grasp it until one day I set up a process in debug, and stepped it through a few instructions at a time. I could then follow it through the code, and learnt how the main backbone worked.

      In my next job, I had about the same amount of code in PHP, written in a curious pseudo object oriented fashion (the main designer had come from ASP). So again, I started with index.php, and debugged my way laboriously through each file.

      It really helps if you can grasp the overall structure, and the only way to do that is on foot...

      --
      There are some who call me ... Tim.
    5. Re:A good starting point by rokj · · Score: 1

      Try to single-step it in debugger from the beginning up to main loop.

      Agreed and good IDE and debugger is a "life saver"; however that does not mean we should not document the code.

    6. Re:A good starting point by Anonymous Coward · · Score: 0

      did you ever write any serious code?

    7. Re:A good starting point by volxdragon · · Score: 1

      Good luck doing that with any sort of real-time application or application that consumes any amount of I/O...

  4. You never will by Anonymous Coward · · Score: 0

    You are not them, your brain solves problems differently. I have found that by creating subs in areas where they have not used them, you can begin to re-write the code little by little. other than that, pouring over it or using a debugger to jump the calls is your best bet for full understanding.

  5. don't feel bad at all by iggymanz · · Score: 5, Insightful

    So you have been handed the steamin' pile o' code, it is great that you are very cautious and deliberate when modifying it. Make a set of regression tests, that is, make a set of test data and procedures and expected results to ensure original functionality that is still desirable is still working and no other errors introduced. It is hard, much more tedious than just creating new code with few constraints.

    1. Re:don't feel bad at all by Anonymous Coward · · Score: 0

      I just wanted to second this. The best thing you can do is get as many tests as possible. It is the only way you can have an ounce of confidence editing a new code base. Once you have tests though, start changing it, especially if the original authors are gone. The more you change, the more you'll know about it.

      By the way, a "change" is different than committing things. It is good try and rewrite a part of the code. In rewriting it you'll see different logic that may not have seemed obvious and potential problems with the current code.

      In the end though it is really hard. Don't be discouraged though because it will still help you to be a better coder. The next time you write something from scratch you'll notice that you've continued to grow as a developer even though your lines of code has decreased quite a bit.

    2. Re:don't feel bad at all by kaiser423 · · Score: 5, Insightful

      Definitely what parent said. Also:

      I have inherited huge code bases. I actually kind of like it. Lots of people whom I thought were idiots, and cursed their code, I later found out that they were quite smart. Others, I found that they just thought about problems vastly different than I, and learning how they tackled problems gave me many more tools in my personal arsenal.

      That said, find a big wall or something. Use a debugger or code analysis tool to find the main execution paths (what calls what and when, etc). Diagram that up on the wall really large. Then use the tools to determine when and why certain auxiliary functions get called. Diagram that up, and you'll start getting a spider on your wall. Go from there using your new understanding to re-arrange the program flow not in terms that make sense to you, but rather seem to be how they are programmed (functional, objective, some pattern). Rinse and repeat until you know pretty much what the code is trying to accomplish in 90+% of the situations, and it's general plan for attack.

      With that diagram, dive in! There's tons of little details in every function that look useless but are usually bug fixes. Use a scalpel, not a hatchet.

      I was deployed remotely with no way for the main programmer to get at me. We had prepared 9 months to collect 4 minutes of data, and the test wouldn't wait for us. I found an odd bug hidden somewhere in ~22k lines of code. I did this over a weekend, and found about 4-5 nasty bugs that were combining to produce what I was seeing, and fixed them. I did this with zero input or help, over a weekend in code I had never seen spread around about 60 files. I spent the first half day just diving in and trying things, and nearly shot myself. That's when I went high-level and dug in from there.

      When that was done, I the took over code maintenance and updates on that project. The other guy had wrote it 100% himself, but because after that exercise I knew the code better him. Sometimes being new is good; you don't have all that cruft of implementations that didn't work, etc, but still linger in the original programmer's head.

    3. Re:don't feel bad at all by RAMMS+EIN · · Score: 1

      ``Use a debugger or code analysis tool to find the main execution paths (what calls what and when, etc). Diagram that up on the wall really large.''

      Actually, I figure that there must be tools specifically for finding execution paths. What are the tools people use for various languages?

      --
      Please correct me if I got my facts wrong.
  6. Use Doxygen by gbrandt · · Score: 5, Insightful

    Doxygen is your friend. run it over the source code and keep the HTML handy for searches and cross references.

    1. Re:Use Doxygen by Anonymous Coward · · Score: 0

      Doxygen can make a class inheritance chart, which might be a usefull place to start. Also, whenever I'm looking at a piece of code for the first time, I'll clean it up, and add comments, making it my own code.

    2. Re:Use Doxygen by eggy78 · · Score: 2, Informative

      I have found that equally useful to Doxygen's standard documentation are the caller/callee graphs (and the source browser as well!). These features are invaluable but they don't get used when you generate documentation with a more-or-less default config.

    3. Re:Use Doxygen by erictheturtle · · Score: 2, Informative

      I feel the same way as OP when trying to make sense of some open source library I'm interested in extending. Doxygen has been a big help. In the future I might also try Source-Navigator.

    4. Re:Use Doxygen by heson · · Score: 1

      Yes, the call graph is magnitudes more important than the class inheritance. Class inhetitance shows how the system was designed, but in long projects chasing a moving target, that doesn't say much about the current state of things (except for how data was supposed to be stored). The call chart (preferably a full chart with nuts and bolts functions filtered out) will show the actual operation of the program.

    5. Re:Use Doxygen by j1m+5n0w · · Score: 1

      I totally agree, the call graphs are very helpful. (Note: graphviz needs to be installed for this to work.)

  7. Comments! by Anonymous Coward · · Score: 0

    Make it your personal mission to soak the code in comments, refactor it where appropriate, et cetera. Diagramming it can help, too. Do all the things they should have done before giving it up; this will help you find what all of the functions do, and discover the important ones.

  8. Re:30 to 40 thousand lines isn't large by any meas by Anonymous Coward · · Score: 0

    Just out of curiosity, what is your opinion of a "Large" codebase then?

  9. It depends on the language by $RANDOMLUSER · · Score: 5, Funny

    If it's Perl or VB, you might want to consider self-immolation as a first step.

    --
    No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    1. Re:It depends on the language by rocker_wannabe · · Score: 1

      Simply running out of the room screaming "No!!!!!!" should suffice. There IS life after programming, believe it or not.

      --
      "Meaningless!, Meaningless!" says the Teacher. "Utterly meaningless!"
    2. Re:It depends on the language by martin-boundary · · Score: 5, Informative

      No, he meant that as an actual offering to the Perl God, Quetzal$@[&shift]L. It's a bloodthirsty god, who never sends the Divine Debugger without at least two pints of the red stuff. I would have immolated a coworker, but the parent poster seems to have been alone in the room :-/

    3. Re:It depends on the language by budgenator · · Score: 1

      I was on fire once you insensitive clod.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    4. Re:It depends on the language by LostCluster · · Score: 1

      VB6's actually very easy to understand when you have the code...

      1. You can control-break at any point in program and be shown exactly the line you're executing and step through with F8 or resume at full speed with F5.
      2. You've got a rather nice project-wide search tool to find functions and subs that the old programmer wrote.
      3. You've got an immediate pane for simulating "What if X was set to..." situations.
      4. The previous programmer likely left behind date-stamps in the OS so if a user can tell you when the feature was developed, you can see what files he was using.
      5. There's a lot of stray VB how-to pages and books out there.

    5. Re:It depends on the language by chill · · Score: 5, Funny

      No, he meant that as an actual offering to the Perl God, Quetzal$@[&shift]L. It's a bloodthirsty god, who never sends the Divine Debugger without at least two pints of the red stuff. I would have immolated a coworker, but the parent poster seems to have been alone in the room :-/

      The fact the above comment is +5 Informative and not +5 Funny makes me very glad I stopped programming in Perl when I did.

      --
      Learning HOW to think is more important than learning WHAT to think.
    6. Re:It depends on the language by BerntB · · Score: 2, Informative

      Funny you should say that.,,

      I quite like this reference from the Perl world about understanding large systems: http://www.perlmonks.org/?node_id=788328

      --
      Karma: Excellent (My Karma? I wish...:-( )
    7. Re:It depends on the language by tempest69 · · Score: 1

      even Voldemort draws the line at advanced coding in perl.
      Some things are not meant to be.

    8. Re:It depends on the language by Target+Practice · · Score: 1

      Oh my, you've been modded "5, informative" for that. I've obviously not programmed enough Perl...

      --
      There's a 68.71% chance you're right.
    9. Re:It depends on the language by Rexdude · · Score: 1

      The fact that above post is modded 'Informative' scares me.

      --
      "..One hosts to look them up, one DNS to find them, and in the darkness BIND them."
    10. Re:It depends on the language by Anonymous Coward · · Score: 0

      No... if its Perl or VB code you inherit you should leap for joy! If its 2.5 MILLION lines of COBOL code written in the 1980's to access an IMS database and then 'bridged' to continue its original IMS transactions patched so they are really accessing DB2 database, then proceed with 'self-immolation as a first step'.

    11. Re:It depends on the language by Anonymous Coward · · Score: 0

      I am on a team that maintains a 100,000+ line mod_perl app that as a memory footprint of over a Gig.
      None of the original developers are around.
      We have considered building a time machine to stop the creators from making some of the more abominable design decisions.

    12. Re:It depends on the language by guyminuslife · · Score: 1

      That is the most I've laughed all week.

      --
      I don't believe in time. It's a grand conspiracy designed to sell watches.
    13. Re:It depends on the language by Galestar · · Score: 1

      VB6's actually very easy to understand when you have the code...

      I inherited a large VB6 kludge of a project last year, and I strongly disagree. .Net is far easier to understand and refactor, and the conversion is relatively painless if you know what you're doing.

      I've been slowly converting to .Net and rewriting one module at a time.

      --
      AccountKiller
  10. Not lots of code by www.sorehands.com · · Score: 5, Insightful

    First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.

    To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such. Run some sort of profiler or flowchart type program on it to get a high level view of the code and how it fits together. If you can get the person(s) who worked on it before you to give you an idea of it fits together.

    1. Re:Not lots of code by Coryoth · · Score: 4, Insightful

      First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.

      To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such.

      30-40,000 lines can be lots of code, it really depends on how maintainably it is written. I've had to pick up codebases that were somewhat smaller but were still diabolical ... good programming environments don't buy you much when the code consists of functions that are many thousands of lines long making little or no use of typedefs or structs (arrays and lots of variables should be enough right?) and convenient variable names like 'e', 'ee', and 'eee'. Even small codebases can become practically incomprehensible if written with little thought given to long term maintenance.

    2. Re:Not lots of code by leoaloha · · Score: 1

      250000 is not a lot of code. Try over a million lines of C for train control of a transit authority. Purchased (read inherited) in escrow because management and the vendor got into a disagreement. The head software guru was upper class as far as I was concerned. I was the network guy. No documentation or very light. He had to live in the code but he was that kind of guy. My hat is off to him, I don't know how he did it

    3. Re:Not lots of code by dgatwood · · Score: 1

      Fair enough. On the other hand, badly written code is self-limiting in size. It almost never gets particularly large because if it is that hard to maintain, it will also be extremely hard to expand in any useful way. Usually by the time it gets past about 10-15,000 LOC, it has to be at least somewhat sensible.

      I tend to agree that 30,000 LOC is not at all large. My trivial little web photo gallery is 8k lines of code. At work, I maintain and periodically enhance a relatively small tool that's about 37k lines. It's fairly simple and straightforward and it's a tiny fraction of my job; I don't consider it large at all. Facebook is a medium-sized piece of software at 300,000 LOC. The Linux kernel is a large piece of software at 2.4 million LOC.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    4. Re:Not lots of code by Anonymous Coward · · Score: 0

      250k isn't a lot either; try a million or two.

      No matter *how* it's written, a million lines of code (non-comment lines, mind you) is a lot.

    5. Re:Not lots of code by npsimons · · Score: 1

      To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such.

      I have to put in a word for Emacs, which Works For Me, and Works Very Well, ThankYouVeryMuch. Never did like XCode, Vslick, VisualStudio, Eclipse or any of those others. Of course, if the poster isn't using Emacs already, that's just another learning curve they will have to climb, I admit. What it boils down to: does your favorite editor support looking up definitions of names in code? If not, switch to one that does, preferably one that is similar to your favorite editor. If you are using Emacs and hitting up against the limits of etags, take a look at CEDET, in case you haven't already.

      Run some sort of profiler or flowchart type program on it to get a high level view of the code and how it fits together.

      Huh. Much as I harp on using profiler's, I would argue that they aren't very helpful at this stage in the game. Granted, most of the code I get to maintain is a mess that won't help by being profiled. However, any tool you can get that helps pick apart the code automatically is a Good Thing; I second this.

    6. Re:Not lots of code by Anonymous Coward · · Score: 0

      Don't forget the all important variables "temp", "temp2", "temp3"...

    7. Re:Not lots of code by heidaro · · Score: 1

      A million or two isn't a lot either. Like my uncle Einar used to say, try 10 million or so.

    8. Re:Not lots of code by jimrthy · · Score: 1

      I know plenty of people who do exactly this.

      They've written their own personal preprocessors that remove comments and mangle variable names, just for job security.

      I can think of all sorts of names to call them. I won't.

    9. Re:Not lots of code by kobaz · · Score: 1

      40k lines is itty bitty. Try working on a 25 year old legacy finance system with 1.5 million lines of C code, *most* of which is complete copy and paste from other parts of the system. The mantra was: "Need a new module... just copy that one, make your changes and commit". It's always fun fixing the same bug in 15 different spots.

      And talk about diabolical... the "database" consisted of arrays of structs written directly to disk. Upgrades of data structures meant hours of importing the old data, copying it into a new struct that was bigger, and writing it back out. SQL databases were available in the 80''s... what gives?

      A good programmer can take a 40k line system and make it mostly understood, documented, and unit tested in roughly 6-8 months (assuming there's no other work to be done). After that painful period, the coder's life will be much easier.

      --

      The goal of computer science is to build something that will last at least until we've finished building it.
    10. Re:Not lots of code by snowgirl · · Score: 2, Funny

      so like... perl?

      More percisely 30-40,000 lines of code is 29,999-39,999 times more lines than one needs to write shitty code...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    11. Re:Not lots of code by Anonymous Coward · · Score: 0

      1st of all 250,000 lines of code is not a lot of code, you sissies dare venture into softswitch land, where you'd find 1.5 million lines of code....good luck learning that

    12. Re:Not lots of code by jhol13 · · Score: 1

      You mean "Exuberant Ctags", don't you?

    13. Re:Not lots of code by Logic+and+Reason · · Score: 1

      It's large when all 30-40k lines are in a single Perl file, with gobs of cut-n-paste code, dozens of global variables referenced from everywhere with no special naming convention, functions sometimes thousands of lines long, helpful function names like "p7e_2b_show", and only incredibly obvious comments like "#print output". Oh, and almost every one of about a thousand database statements, many of which involve user input, uses string interpolation instead of bound variables. No test suite or design documentation, either, so in many cases you have to guess at what the code is even supposed to be doing in the first place.

      Just hypothetically speaking, of course...

    14. Re:Not lots of code by Anonymous Coward · · Score: 0

      Even small codebases can become practically incomprehensible if written with little thought given to long term maintenance.

      Heck, 4KB is enough to confuse me.

    15. Re:Not lots of code by Anonymous Coward · · Score: 0

      So you've read my code then?

    16. Re:Not lots of code by lena_10326 · · Score: 1

      And talk about diabolical... the "database" consisted of arrays of structs written directly to disk. Upgrades of data structures meant hours of importing the old data, copying it into a new struct that was bigger, and writing it back out. SQL databases were available in the 80''s... what gives?

      Network database storage has network latency, SQL parsing, and storage overhead. Local database storage has SQL parsing and storage overhead. Local binary data storage is very fast and minimal. I'm a bit surprised you would ask what gives. It's pretty obvious.

      So anyway. They were probably keeping the structs small to minimize storage requirements. Padding out the structs for future expansion used expensive storage space and increased disk IO.

      --
      Camping on quad since 1996.
    17. Re:Not lots of code by Anonymous Coward · · Score: 0

      First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.

      I usually print it on paper and then use the weight as a measure...;-)

    18. Re:Not lots of code by greg1104 · · Score: 2, Insightful

      Sure, if you only have a trivial 250K lines of code, I guess you can use crappy tools like Xcode and Visual Studio to maintain your project. The rest of us have to use grown-up tools that look like this:

      src$ find . -print | xargs wc | tail -n 1
        1950894 7085675 56777966

      There's only one way to learn your way around a new codebase, and the worst thing you can do is use a tool that aims to help with the job. Want to know how stuff flows through the program? Find where the program starts and draw the diagram yourself as you map it out. What I do is find something that I think I need to change, and a clear goal for what change I want to make to it, then map out exactly how the program reaches that point. You need to have a targeted goal to make progress with a stack of new code; just trying to read the whole thing or stare at diagrams of it won't teach you anything. Put the sucker into version control, generate regression tests of its output, figure out how to build after making a trivial change, and then try making a small non-trivial one. That's the only real way to learn how a program really works that internalizes enough of it into your brain that you can move upward to bigger maintenance tasks.

      And, for the record, I would like to tell everyone who suggested using a debugger to trace through the code instead of figuring it out by inspection and experiments that you are all a bunch of pussies. Good luck with that when the code breaks in production and you've got nothing but log files from the period loading up to the crash to work with. If I can get a debugger to attach to a broken program when the problem exists, it is by definition a trivial problem to solve; if I can even get a backtrace of where the thing is stuck at when it goes bad that's automatically an easy one. The only way to learn what you should be logging and defensively doing is by only relying on logs, assertions, and testing all the time--never a debugger. Because when things go really wrong, you won't have your debugger to save your ass--but if you built in good testing and logging capabilities, they'll be there.

    19. Re:Not lots of code by Hurricane78 · · Score: 1

      Try, 250,000 of code.

      Which would be about 2,500 lines of Haskell code.
      And 247,500 lines of documentation to make sense of them. ;)

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    20. Re:Not lots of code by geminidomino · · Score: 1

      src$ find . -print | xargs grep \{ | wc -l
        56777966

      src$ find . -print | xargs grep \} | wc -l
        56777965 ... Fuck.

    21. Re:Not lots of code by moonbender · · Score: 1

      Going to definitions is... a start, but it's really the bare minimum. I wouldn't want to program in an editor that doesn't also find references, create a call hierarchy, find sub-/super-classes etc. I don't think this means I'm stupid; stupid is creating highly structured data with an editor that doesn't understand this and doesn't let you browse the data along it's structural links.

      As for using a profiler, they do other things beyond finding speed bottlenecks. E.g. they can visualize object relations, and it's often clear which objects/classes are central to the application and which are peripheral. I never tried to understand unknown code with the help of a profiles, but it's an interesting idea. Would need strong integration into the IDE for me, otherwise it's just a chore.

      --
      Switch back to Slashdot's D1 system.
    22. Re:Not lots of code by Anonymous Coward · · Score: 0

      "First of all, 30-40,000 lines of code is not lots of code."

      Agreed. Especially if it's all in one file.

    23. Re:Not lots of code by pz · · Score: 1

      First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.

      Depends on the language. For most of the popular languages (C / C++, Lisp, Scheme, Perl, Python, PHP, Ruby, Fortran, Matlab, Ada, ALGOL, whatever), you are spot-on. For languages that have higher density (APL primary among them, but I might also include hand-written assembler), 30-40k lines is seriously daunting.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    24. Re:Not lots of code by Anonymous Coward · · Score: 0

      I agree. 40 KLOC is laughable.

    25. Re:Not lots of code by kobaz · · Score: 1

      Network database storage has network latency, SQL parsing, and storage overhead. Local database storage has SQL parsing and storage overhead. Local binary data storage is very fast and minimal. I'm a bit surprised you would ask what gives. It's pretty obvious.

      Actually, if you had worked on the system, you would be as annoyed/confused as I was. Query parsing has such low overhead compared to the time it took to do everything possible to not increase the size of structs. It also has a much lower overhead in terms of comparing it to the time it takes to manually write code to do left/right/inner/outer joins, subqueries and the like by hand. Linear lookups became so slow that covering indexes were shoved on some of the fields by hand. But since it was by hand... programs that weren't upgraded to use the new indexes, were still slow.

      As time went on, they were developing: a high-maintenance, minimal feature, error prone... database system! Wow, that's so much more optimal than buying Ingres.

      We avoided increasing the size of structs not for speed reasons... each struct was approaching half a meg already, so who cares. We avoided adding to them because it involved writing upgrade programs, and running said upgrades on hundreds of customer systems manually.

      Some structs were usually padded quite nicely. Most had an extra kilobyte or two of padding. Some didn't. Some had 5 bytes free, and we would do crap like compression via bit twiddling to make use of as little extra space as possible. Not for speed, but to avoid upgrades.

      Network latency, storage overhead and parse time would have been gladly accepted by all the devs in order to reduce the horrid nightmare of maintenance issues. But there were no plans by the phbs to overhaul the system.

      There's something in programming called refactoring. When the maintenance on your app becomes more expensive than writing a new app... it's time to start planning a rewrite. But you generally don't want to even let it get that far. Ideally you refactor as you go, so that maintenance never gets even near that point.

      Sure, in the 80's storage was expensive, computers were slow. In the 90's storage got cheaper, computers got faster.. But there was that same code, same raw structs... same loops copy and pasted across the system.

      You should really think about the possible situation at hand before stating how something is 'pretty obvious'.

      --

      The goal of computer science is to build something that will last at least until we've finished building it.
    26. Re:Not lots of code by Idarubicin · · Score: 1

      First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.

      A standard-sized novel runs about a hundred thousand words. Weightier tomes, particularly those eight-hundred-pagers from the fantasy genre, can run to two or possibly even three times that. If you figure that each sentence accomplishes one narrative 'task' and uses up about ten words, then a whole novel is about ten thousand lines of code. (Some programming languages make this sort of correlation a bit more explicit).

      It's silly to quibble over less than an order of magnitude when you're using vague terms like "lots". And I'd argue that a novel's worth of code counts.

      --
      ~Idarubicin
    27. Re:Not lots of code by oldhack · · Score: 1

      You see that dog? That's my dog crapping in your lawn.

      --
      Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
    28. Re:Not lots of code by ztransform · · Score: 1

      To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such.

      I went and bought a 24 inch flat screen monitor with my own money and brought it into work.

      If I couldn't have 3 editor windows open side-by-side I would not be able to perform the maintenance required.

      I use one editor for the code I'm working on. One editor for the code I'm working from. One editor to look at the file being processed. Sometimes I split my editor windows for more references: sometimes I need to look at different parts of the same file, sometimes I have to view a configuration file for interpreting the data file the script will interpret.

    29. Re:Not lots of code by jijitus · · Score: 0

      40k lines is not too much, even on horrible languages like RPG or VB. But if it is spaguetti code, he's in deep sh!t. He should try identifying which modules are there and why, and (if it's still very hard for him to mantain it) CAREFULLY replace those modules with new ones. By breaking stuff he could learn why the original code worked fine...

    30. Re:Not lots of code by Anonymous Coward · · Score: 0

      'Lots' is a relative term. If you're used to working on projects involving 250,000 lines of code, then sure, 30,000 isn't much... to YOU. Maybe submitter is used to working on small, concise projects which only had a couple hundred or thousand lines of code. In that case, yes, 30,000 lines of code is a lot to submitter.

    31. Re:Not lots of code by lena_10326 · · Score: 1

      Query parsing has such low overhead compared to the time it took to do everything possible to not increase the size of structs.

      Then you were basically working on a toy system processing very few transactions. In any system processing between mid to high volume, a SQL engine is serious overhead and should be accounted for in the design.

      You should really think about the possible situation at hand before stating how something is 'pretty obvious'.

      It was obvious. It is still obvious. Either the developers at your shop were blistering idiots for designing a solution for a problem that does not exist or you were the idiot for not understanding the software requirements.

      --
      Camping on quad since 1996.
    32. Re:Not lots of code by kobaz · · Score: 1

      Then you were basically working on a toy system processing very few transactions. In any system processing between mid to high volume, a SQL engine is serious overhead and should be accounted for in the design...

      It was obvious. It is still obvious. Either the developers at your shop were blistering idiots for designing a solution for a problem that does not exist or you were the idiot for not understanding the software requirements.

      Feisty aren't we?

      What 'serious overhead' are you talking about anyway? The microseconds that it takes to parse a query and start the execution? It's always finding and fetching the data that's the most time consuming. But of course you already know that since you're apparently the expert.

      No matter what system you're using, you still have to find and fetch the data... and what better tool to use than an SQL or equivalent database server?

      At this point it's probably moot to argue, but I'll need to make some adjustments to your 'facts'.

      I guess you've never actually worked with a large system, with millions of records, and thousands of transactions a day, used by some of the worlds largest collections agencies. It was such an annoying system to work with, so I hate to defend it, but it did do it's job well. It would be far from what anyone would consider to be a toy.

      An SQL engine would have been perfect for the job at hand. Much more so than the inhouse datastore. And in fact we did use an SQL server for many of the reporting tools we had to write. It was a pleasure working on those projects. The problem of course is that the SQL part was a duct-taped on after-thought. Nightly syncs were done from the inhouse store to the SQL dbs so that the reports would be up to date. Despite 10's of millions of rows and hundreds of tables, the queries that were properly written, that used tables that were properly indexed, were lightning fast compared to the inhouse database.

      The developers I worked with were far from idiots, myself included too of course. The crappy design stemmed from a lack of care from management, and a lack of systems training that should have been done before developers touched code for the first time after being hired. It's just one of those things that one needs to put up with in a corporate environment. Things are the way they are, and if you're not in a management position... well... tough luck. You do your tasks the best you can and go home.

      That type of atmosphere breeds copy and paste programming, lack of formal design, and results in a crappy system.

      Needless to say, I don't work there anymore.

      I did find out after leaving that sometime later, there was a major re-architecture movement to a multi-tier platform with an oracle backend.

      --

      The goal of computer science is to build something that will last at least until we've finished building it.
    33. Re:Not lots of code by Chowderbags · · Score: 1

      Yeah, and fuck those doctors who use laparoscopic robots to do surgery. What happens if they're at a restaurant and someone needs a tracheotomy right there and there's nothing but a semi-clean steak knife and some paper napkins?

      Don't deride the use of a tool that solves quite a few problems just because it won't be there all the time. Is it good to know how to read a log or write good test harnesses? Absolutely. Does that mean I won't ever want to get a step-by-step feel for some sections of code? No.

    34. Re:Not lots of code by toddestan · · Score: 1

      Fair enough. On the other hand, badly written code is self-limiting in size. It almost never gets particularly large because if it is that hard to maintain, it will also be extremely hard to expand in any useful way. Usually by the time it gets past about 10-15,000 LOC, it has to be at least somewhat sensible.

      Using copy and paste you can get very large, messy projects pretty easily. Nothing like having multiple versions of the same 2000-line function, all of them used and each subtly different from each other. Or simply copying lines code all over the place instead of placing it into a function and calling the function when you need it.

    35. Re:Not lots of code by Anonymous Coward · · Score: 0

      No matter how badly the code is written, no matter how many klocs it is if u know the language decently enough you can do it. trust me, once u start doing it you'll realize in a week that you have got some results provided you Love to do that project...otherwise you can find plenty of reasons to scrap it.

    36. Re:Not lots of code by dgatwood · · Score: 1

      You're trying to make me cry, aren't you?

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    37. Re:Not lots of code by Galestar · · Score: 1

      While I can't speak for db servers of the 80's, Modern database servers have many optimizing techniques such as indexing and caching in ram that makes it much more efficient than any basic binary readers. You would need to spend a lot of time developing your reader in order to make it as efficient as a modern db server on large datasets.

      --
      AccountKiller
  11. Hunt down the original developer by Anonymous Coward · · Score: 5, Funny

    (And then shoot him.)

    1. Re:Hunt down the original developer by istartedi · · Score: 1

      (And then shoot him.)

      With Lisp?

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    2. Re:Hunt down the original developer by $RANDOMLUSER · · Score: 1

      shoot(huntdown(original developer))

      --
      No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    3. Re:Hunt down the original developer by Omnifarious · · Score: 1

      Your comment is much funnier than the grandparent, though without the grandparent it couldn't have existed. :-)

    4. Re:Hunt down the original developer by ottothecow · · Score: 5, Funny

      shouldn't that be more like shoot(huntdown(first(developers)))?

      --
      Bottles.
    5. Re:Hunt down the original developer by Anonymous Coward · · Score: 0

      (And then shoot him.)

      Well, he inherited the code. Which means someone died and left it to him. So, he'd be shooting a dead person.

      What's the point?

      Contest the will?

    6. Re:Hunt down the original developer by Matheus · · Score: 1

      No... that would be shoot(huntdown(car(developers)))

    7. Re:Hunt down the original developer by poopdeville · · Score: 1

      Management is scared of the Haskell version: shoot $ huntdown $ head $ developers

      --
      After all, I am strangely colored.
    8. Re:Hunt down the original developer by Bugs42 · · Score: 1

      (shoot (huntdown (car (developers))))
      C'mon, this is /., it should NOT have taken this many posts to get grammatically correct LISP

      --
      Programmer: an ingenious device that converts caffeine into code.
    9. Re:Hunt down the original developer by Anonymous Coward · · Score: 0

      shoot(huntdown(developer[0]))

    10. Re:Hunt down the original developer by istartedi · · Score: 1

      How do we know it's the first guy in the list? How do we know there's only one developer responsible?

      The version with the abstract "original" function is more correct, IMHO. Also, it's Lisp not LISP.

      (shoot (huntdown (original (developers))))

      I've only been studying it a while, but based on my limited understanding, having several people pour over a small function before getting it right is par for the course. Of course, now that it's complete it should solve all the problems. If it doesn't, it's the world's fault and not the program.

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    11. Re:Hunt down the original developer by Anonymous Coward · · Score: 0

      noob

      its shoot(huntdown(car(developers)))

    12. Re:Hunt down the original developer by Hurricane78 · · Score: 1

      If it’s Java, it would be:

      project.getDevelopersGetterFactory().getDevelopers().getFirstGetterFactory().getFirst().getDownHunterFactory().getDownHunter().do().getShooterFactory().getShooter().execute()

      And in Haskell it would the following very obvious and self-explanatory code:

      s $ h $ f $ d

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    13. Re:Hunt down the original developer by selven · · Score: 1

      You're shooting the huntdown of the first developers? So you're saving their lives?

    14. Re:Hunt down the original developer by Anonymous Coward · · Score: 0

      compile failed, you forgot your ;

    15. Re:Hunt down the original developer by ottothecow · · Score: 1

      No, you are shooting the result of the huntdown which is really only bad for the hunter if he comes up empty handed (and wants to keep his hands intact)

      --
      Bottles.
    16. Re:Hunt down the original developer by Anonymous Coward · · Score: 0

      I, too, am learning Lisp (slowly) with the help of books from David Touretzky and Peter Seibel. While we're busy being pedantic about things, one would pore over a small function and not typically pour over it. Then again, your signature, for all intents and purposes, seems to serve as cleverly disguised grammar-bait. I salute you, your sig and your future Lisp skills! Isn't arguing on the Internet fun?

    17. Re:Hunt down the original developer by badkarmadayaccount · · Score: 1
      --
      I know tobacco is bad for you, so I smoke weed with crack.
    18. Re:Hunt down the original developer by badkarmadayaccount · · Score: 1

      Can somebody do a Prolog version?

      --
      I know tobacco is bad for you, so I smoke weed with crack.
  12. Not at all. by hemorex · · Score: 5, Insightful

    I find that if the other programmer wrote it in such a way where it's too complex for me to follow, I'm not the one who's a moron.

    1. Re:Not at all. by tsm_sf · · Score: 5, Insightful

      Man, always when I run out of mod points.

      Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.

      --
      Literalism isn't a form of humor, it's you being irritating.
    2. Re:Not at all. by Anonymous Coward · · Score: 1, Interesting

      being a genius mean getting the right feature on time
      the customers dont care for craftsmanship, it suck, but
      deal with it

    3. Re:Not at all. by Jane+Q.+Public · · Score: 1

      To add to that:

      What language is it in? That could make a big difference in our answers. But in general, if it is very old code it should at least contain comments. If it was written in the last few years, the code should be in discrete sections that are organized in a logical manner. If not, then they were either seriously old-school programmers, or hacks.

    4. Re:Not at all. by CorporateSuit · · Score: 3, Insightful

      Yes, but there's also when you hire the new guy, fresh from college, and he sits down at his work station. After a few days of getting absolutely no work done, he comes to you and tells you he wants to rewrite the core 50K lines of tested, trusted company code because he thinks it's not written "by the book". To which, the only sane reply is "You touch that code, and I will set you on fire."

      --
      I am the richest astronaut ever to win the superbowl.
    5. Re:Not at all. by Chris+Newton · · Score: 2, Insightful

      Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.

      I always thought clever code was code that everyone could understand, not code that no-one could understand.

      It’s like Blaise Pascal’s apology for writing a long letter because he didn’t have the time to make it shorter: it’s often easier to produce some grandiose design that treats anything awkward as a special case than it is to identify a simpler, more consistent underlying concept and then write simpler code to model that.

    6. Re:Not at all. by Anonymous Coward · · Score: 0

      Clarke's third law, "Any sufficiently advanced technology is indistinguishable from magic", also applies to software. Just because the person handing out the spaghetti doesn't know how pasta is made, doesn't make the pasta maker a master magician;)

    7. Re:Not at all. by Toze · · Score: 1

      This.

      My current job, I got handed about 250K lines and told to rewrite it from scratch. The end product was about 40-50K lines, which I've since reduced to 35K lines- and not by being "clever," just by cutting out unnecessary things, and adding useful things- like loops. 9_9 The original coder was so mind-numbingly bad that any other developer at the company that had to deal with his dreck got laughed at, in pity but more in relief. I'm still paranoid that I'm not really a good developer, but I'm now convinced I'm at least an adequate one, and I measure the quality of my code by how easy it is to understand and modify it when I come back four/six/twelve months later to add a feature. If I start going cross-eyed, I don't think "oh, I've lost my touch," I think "what dumb asshole wrote this- oh, right, me."

      I'll admit my experience is limited, and I haven't done really complex stuff, but I've never seen code that was hard to read and was written well. That includes my own code.

      To address the original issue; reading 40K lines isn't fun. Find a bug report for some peripheral part of the system and start by fixing that. Work your way around and into larger systems, and when you finally reach the core of the code, you'll have a pretty good idea of the developer's style, idiosyncrasies, and what the calls the core is making are doing.

      --
      No OS on the planet can protect itself from a user with the admin password. - Yvan256
    8. Re:Not at all. by rgmoore · · Score: 1

      There's more than one kind of genius. The kind of genius it takes to solve a difficult and important problem is different from the kind of genius it takes to write clean and easily understood code. My experience using software for my scientific field is that the best software from the standpoint of getting good results is often very poorly written. That's because the people writing it are geniuses in their field but often indifferent programmers. It's much easier to fix a program that does an ugly job of implementing the right solution than one that does a beautiful job of implementing the wrong solution.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    9. Re:Not at all. by arth1 · · Score: 1

      I always thought clever code was code that everyone could understand, not code that no-one could understand.

      Usually, but not always. There are times where you need to eke out every cycle you can, and do every trick possible to reduce the worst case bottleneck, because it is timing sensitive, and runs in a context where you can't just throw more hardware at it. That kind of code isn't always pretty, and if you try to rewrite it "by the book", Bad Things can happen.

      Also, is it really possible to write code that "everyone" can understand? Sometimes you have to assume that whoever inherits the code has at least some basic skills, and understand common techniques and terminology.
      I mean, do you really want to explain a Schwartzian transform every time you use it, in a way that an MBA can understand it?

    10. Re:Not at all. by Anonymous Coward · · Score: 0

      You must have worked on the 100 kloc pile I'm working on now. I've got it down to 88 kloc and expect another 50 will come out before it really starts to make sense.

    11. Re:Not at all. by ajlisows · · Score: 2, Insightful

      Then again, the creator MAY have been a genius. Perhaps he was told "Put this enormous program together in one month or the company is screwed." In cases like that, poorly thought out algorithms, bloated classes, using variables with names like "x", "y", "z" with no comments, nothing really works except for the absolute bare minimum required and other coding no-nos probably do not seem that important. Given appropriate time and resources, perhaps he could have written the greatest code EVAR! Given a very limited time frame and managing to save the company would probably qualify them as a genius.

    12. Re:Not at all. by Tablizer · · Score: 2, Interesting

      I find that if the other programmer wrote it in such a way where it's too complex for me to follow, I'm not the one who's a moron.

      But YOU get the blame, which is the problem. This kind of thing happened to me recently when I inherited a big pile of MS-Access code with variables like A34 and 300 objects (tables, reports, queries, etc.). I went from an "excellent" rating to a "C" rating on my evaluation because they wanted quick turnaround. I felt like the victim of a hit-and-run. I'm not the one who did the crime, yet I'm the one with the black eye and a missing wallet. The Pasta Mugger did a number on me.

      At least with text source code you can find or write variable, function, and command indexers/profilers to help one see the structure, find definitions, and browse relationships. Not so easy to do that with MS-Access with all it's proprietary binary crap. I found a way to extract some of the info, but it looks different from how you'd see it inside MS-Access so it's hard to relate to. Gotta love MS.

    13. Re:Not at all. by Chris+Newton · · Score: 1

      There are times where you need to eke out every cycle you can

      That’s a fair point.

      Curiously, though, such code often becomes simpler from some perspective. For example, extreme optimisation sometimes comes down to things like efficient use of pipelines and caches. Achieving these goals might lead to more straight-line code or to a “flatter” memory layout with related data stored together.

      Such optimised code might not be the way you would “naturally” write the algorithm, but a quick explanatory comment goes a long way, and surely anyone working on such performance-sensitive code would be familiar with these principles even if they hadn’t seen a particular case before.

      Moreover, this all tends to happen at a very low level. I’ve rarely seen a case where the performance-driven hackery couldn’t still be wrapped up in a fairly tight module and present a normal interface to the rest of the code.

      Also, is it really possible to write code that "everyone" can understand? Sometimes you have to assume that whoever inherits the code has at least some basic skills, and understand common techniques and terminology.

      Sure, of course. I was oversimplifying, perhaps a little too much.

      That said, I do think it’s a reasonable goal that code written for a certain project should be accessible with only minimal support to anyone from that project who is likely to read it. That is, while it may sometimes be necessary to deviate from this rule for practical reasons, such deviations should be deliberate and for a specific purpose, and suitable precautions should be taken to make sure any reduction in clarity does not become a liability.

    14. Re:Not at all. by Chris+Newton · · Score: 1

      It's much easier to fix a program that does an ugly job of implementing the right solution than one that does a beautiful job of implementing the wrong solution.

      OK, but if your code is ugly, will you be sure that it is implementing the right solution, and that you can keep it that way as it evolves?

    15. Re:Not at all. by FooAtWFU · · Score: 1
      Clever code is code that adequately intelligent people can reasonably understand, but I'm not going to cut out my stacks of

      map { ... } sort grep { ... } keys %hash;

      just because you're not familiar with functional Perl. Otherwise we'd write everything in COBOL, no? :)

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    16. Re:Not at all. by Chris+Newton · · Score: 1

      For what it’s worth, that’s not really the kind of target I was gunning for. If you’re going to program in Perl, then of course you need a reasonable level of Perl proficiency to understand the code, and your example is hardly esoteric. I would imagine that it is also comfortably within the ability of a typical Perl programmer to comprehend after a little reading, if they haven’t encountered functional style before.

    17. Re:Not at all. by dkf · · Score: 1

      Then again, the creator MAY have been a genius.

      A real, experienced genius would leave it in a state where it could be maintained by a lesser mortal, even with all the other constraints on geniosity.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    18. Re:Not at all. by mcrbids · · Score: 1

      I find that if any programmer writes code in such a way where it's too complex for others to follow, he's the one who's a moron.

      There. Fixed that for you! Pure genius is pure simplicity. Anybody of average intelligence can come up with a a complex answer to a complex problem. But it's somebody of far more intelligence who can come up with a simple answer to a complex problem!

      Many problem are, at first look, very complex, but, when given sufficient genius, succumb to the genius of simplicity. And even when the problem is very complex, the code used to solve it should be easily read and understood!

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    19. Re:Not at all. by TapeCutter · · Score: 1

      "Perhaps he was told "Put this enormous program together in one month or the company is screwed." In cases like that..." - you put the hard word on them for more money.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    20. Re:Not at all. by Anonymous Coward · · Score: 0

      I find that if the other programmer wrote it in such a way where it's too complex for me to follow, I'm not the one who's a moron.

      How exciting and different that must be for you.

    21. Re:Not at all. by kegon · · Score: 1

      Man, always when I run out of mod points.

      Why ? Mod points are not for enforcing your opinion.

      Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.

      I can agree with you here, without mod points.

      Unless code is high level and ergo inefficient, the programmer will have to do something "clever". Clever is usually difficult to understand and difficult to explain. No one likes to write documentation especially if it's difficult documentation. And then you have bad programmers who write clever rubbish or rubbish that they think is clever and so on.

    22. Re:Not at all. by ZeLonewolf · · Score: 1

      Yes, but there's also when you hire the new guy, fresh from college, and he sits down at his work station. After a few days of getting absolutely no work done, he comes to you and tells you he wants to rewrite the core 50K lines of tested, trusted company code because he thinks it's not written "by the book". To which, the only sane reply is "You touch that code, and I will set you on fire."

      Perhaps that "tested, trusted company code" is a steaming mess of spaghetti code that's been cautiously poked, prodded, and duct-taped over the years into something that in the end works but is a maintainability nightmare?

      --
      "If at first you don't succeed, lower your standards."
    23. Re:Not at all. by BartholomewBernsteyn · · Score: 1

      Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.

      Far too many *programmers* have become invaluable assets due to their reluctance to write maintainable code, ignoring the most basic rules of software development, not to mention design patterns, etc. Needless to say, these individuals get handled with the utmost respect, often being the only ones who can make sense of the mess they have left.

      Indeed, geniuses they are.

    24. Re:Not at all. by rgmoore · · Score: 1

      OK, but if your code is ugly, will you be sure that it is implementing the right solution, and that you can keep it that way as it evolves?

      My experience with code written by subject experts who aren't great coders is that their code is ugly but not actually evil. They aren't genius coders, but at least they know their limitations. They usually try to do things in a simple, straightforward way. The resulting code tends to be poorly factored, but it avoids the really dangerous trap of trying to be too clever. That means you can usually grind your way through it to figure out what it's trying to do and whether it succeeds. Doing so can be a long slog, but at least the path isn't strewn with traps for the unwary.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    25. Re:Not at all. by Lonewolf666 · · Score: 1

      Perhaps that "tested, trusted company code" is a steaming mess of spaghetti code that's been cautiously poked, prodded, and duct-taped over the years into something that in the end works but is a maintainability nightmare?

      Probably. I've seen my share of that.

      An interesting aspect is that a quick hack can actually be the fastest way to get the job done - in the first two or three iterations. But later on, the side effects of even minor changes grow dificult to contain and things that should be minor programming tasks start taking weeks.
      So if you are content to use the old code exactly as it is, GP's approach of leaving the code alone is fine. But in my experience, sooner or later some business requirement comes up that means changing the functionality. At that point, the steaming mess of spaghetti code will really hurt you.

      It is easy to fall into that trap, and getting out of it takes patient refactoring. Usually takes more time than a proper design would have taken in the first place.

      --
      C - the footgun of programming languages
    26. Re:Not at all. by tsm_sf · · Score: 1

      I think that's the definition of being an employee, not a genius.

      --
      Literalism isn't a form of humor, it's you being irritating.
    27. Re:Not at all. by Anonymous Coward · · Score: 0

      Yes, but there's also when you hire the new guy, fresh from college, and he sits down at his work station. After a few days of getting absolutely no work done, he comes to you and tells you he wants to rewrite the core 50K lines of tested, trusted company code because he thinks it's not written "by the book". To which, the only sane reply is "You touch that code, and I will set you on fire."

      Couldn't agree more. I think it was Jeff Raskin who once said, "Never let programmers rewrite code." Fact is, most of us are fundamentally convinced, when taking on a job that someone else coded, that we could do a much better job than the original developer, if only we were given the freedom to rewrite everything the "right" way. Now, in an absolute sense that could very well be true, if we had the luxury of starting from scratch. Usually we don't though, because a. software companies don't have unlimited development budgets and may not be able to afford to completely redevelop a product they already paid for, and b. more importantly, any older codebase is going to have a lot of thorns. That is, weird shit whose purpose is not always obvious, but was likely put there to take care of some oddball edge case whose origins have long since been forgotten, but may still be needed. Basically, I'm saying that a degree of conservatism is in order when dealing with large projects that have been around for a while.

      That's usually management's job: the "it works so don't fuck with it unless you can make a damned good case as to why we should take the risk" mentality. Experienced developers learn that their job is is not always about perfection, but about shipping something that actually works so the company can sell it and keep meeting payroll. That may mean living with code that just grates on your nerves and makes you want to throw up every time you look at it. But it's also how most of us make a living.

    28. Re:Not at all. by DutchUncle · · Score: 1

      How many of us have been *both*? I've been the new guy who sees the spaghetti, and I've been the experienced guy who knows where the bodies are buried AND WHY. Right now I'm both at the same time: building a re-engineered re-designed version of an old codebase on obsolete hardware, and it has turned out over time that most of the special cases and weird code were because the APPLICATION has special cases and weird situations that the so-called subject expert had forgotten to tell us about.

    29. Re:Not at all. by Anonymous Coward · · Score: 0

      There's the original programmer now. Get him!!

    30. Re:Not at all. by Anonymous Coward · · Score: 0

      I've had that attitude directed at me before. I was told I was one of those "too many". This was over a time when I asked a peer for help about business rules, and he noticed I was using some functional constructs instead of the factory pattern. As it turns out, factories and functors are the same thing. Any time you can use a factory, you should use a "map" function. He didn't seem to get that. He basically called me a cowboy coder because I was using these "ad hoc" constructs instead of exactly what Martin Fowler said I should use.

      In the end, I told him, "It's my ticket. I'll do it how I want. Change it yourself on your own time if you want. I won't change it back. But you're going to find you're writing twice as much code just to lose clarity." A little later, he told me he tried writing a trivial factory, and compared it to the equivalent map, and said I was right. Breaking a functor up into classes makes it harder to see what "sub-functors" have in common, and more importantly, how they are different. All this at twice the length, or more, when you consider importing.

      You see, it's easy to complain about people who do things differently than you do. Especially when they're getting called geniuses by the management. But sometimes they really do know something you don't. That's not their fault, is it?

    31. Re:Not at all. by knewter · · Score: 1

      I absolutely wish I had mod points as well. We hired a guy one time that argued about design patterns with me every. single. day. He never produced a single fucking line of code we could ship. Took me three months to finally bother with firing him. But his professors all thought he was great!

      Side note: when the fuck did it happen that someone could graduate college without being able to write a coherent english paragraph? I had at least 2 underlings that I had to ban from communicating with customers on account of their ineptitude with english. I hate that these people graduate, and hate more that I assumed someone with a college degree could write meaningful english. Lesson learned.

      --
      -knewter
    32. Re:Not at all. by Anonymous Coward · · Score: 0

      True, but people wouldn't be proclaiming that the new guy is the second coming.

  13. Visualisation by gilleain · · Score: 5, Informative

    Anything ranging from just sketching out some informal package diagrams on some paper (I quite like using an A3 sketchpad) to something more like Code City which can work with code in smalltalk, java, and c++. There are UML diagram makers, of course, but automated diagrams like that probably need to be edited.

    In fact, it is not the finished diagram that helps so much as the drawing of it, which is why paper and pencil is so good. Or a vector graphics package.

    1. Re:Visualisation by Anonymous Coward · · Score: 0

      I can't seem to find another means of contacting you through your ~gilleain page here. Even though this is an older comment, hopefully you still see this reply. I am quite interested in your suggestion of using an A3 pad for diagramming, note-taking, etc. I am guessing since you referenced A3 specifically (rather than 11x17, Tabloid, Ledger...) that you're in europe, south america, etc. and not in the US. Even if that is the case, what is the source (perhaps brand/item #) of your tablets? I have been searching since i read your comment and have yet to find something suitable. My primary means of note-taking is on "letter" sized pads (most similar to A4), which i find somewhat restrictive when i have a lot of information i'd like to commit to a page, especially considering my large, unwieldy handwriting.

      Anyhow, if you could tell me the source of your tablets (i suppose here is the easiest...) I would appreciate it.

      Thanks,
      Lee

  14. use a debugger by Anonymous Coward · · Score: 0

    The best way to figure out how the code does action X is to run it under a debugger while it does the action, inspecting how the data structures in the program change, setting breakpoints where the decisions are made to see what happens, etc. You get to see dynamically what the program is doing step by step with the computer keeping track of it for you, instead of puzzling it out from a static listing. Running the code that way is a much faster way to gain understanding than simply reading the code.

  15. Re:You are an idiot by Anonymous Coward · · Score: 0

    Thank you for validating my decision to get the hell out of IT.

  16. Use it by mosb1000 · · Score: 1

    The only way to learn the code is to work with it. Simply reading through it won't help, you have to go try to change things and see what works and what doesn't.

    The main thing that bothers me when working with other peoples code is the sheer number of variables they use. I tend not to declare a new variable unless it is absolutely necessary (and in object oriented programming variables other than pointers are almost never necessary). It seems like code written this way is easier to read and understand (and significantly smaller). This is slashdot, so there are a lot of other programmers out there. Am I off base here? What do you think about intermediate variables that are not strictly necessary?

    1. Re:Use it by mosb1000 · · Score: 1

      I said pointers are variables. . .

      variables other than pointers are almost never necessary

      That's what "other than" means.

    2. Re:Use it by EvanED · · Score: 2, Interesting

      Am I off base here? What do you think about intermediate variables that are not strictly necessary?

      I can't say you're off base per se (I don't have nearly enough production dev experience to make statements like that, and even if I did, I couldn't speak for everyone), but my personal style is not quite the complete opposite of yours.

      I pretty heavily use intermediate variables. Why? A couple big reasons. One, if you give the temporary variables decent names, they serve as additional documentation. Two, if you're debugging, you can look at those intermediate values in a debugger (or log them) much easier than you could if they weren't explicitly stored somewhere. In most graphical debuggers you can just hover the mouse over a variable and see its value; if you didn't have that variable, you'd have to enter the expression in the immediate window or set up a watch or something like that.

    3. Re:Use it by aoteoroa · · Score: 1

      I'm not trying to troll here but how do you write anything without variables? Or are you suggesting that some people will use too many variables like: $FirstName $MiddleName $LastName $BirthDate $Gender when they could have simplified their code with a single class called Person?

    4. Re:Use it by mosb1000 · · Score: 3, Informative

      Not without variables, but without unnecessary ones. For example, someone might write:

      int a;
      int b;
      int c;
      int d;
      int e;
      int f;
      int g;
      a = dropBox1.Value;
      b = dropBox2.Value;
      c = dropBox3.Value;
      d = dropBox4.Value;
      e = a + b;
      f = c + d;
      g = e * f;
      result.Value = g;

      While I would write:

      result.Value = ( dropBox1.Value + dropBox2.Value ) * ( dropBox3.Value + dropBox4.Value );

    5. Re:Use it by mosb1000 · · Score: 1

      From a documentation standpoint, I have never found descriptive variable names to be good enough. The problem is that while the programmer may have a good idea what he means when we chooses a name, and indeed that name may make a lot of sense if you already have a good understanding of how the code works, someone new who is unfamiliar with the program will not understand it because they do not know how the code works. In the mean time, it's a lot of work to track back through intermediate variables (especially if the code has been reworked a lot) to figure out what they do and where they come from.

      As far as debugging goes, I will usually add variables at critical points, and then remove them once the code is working. I don't know if that's any slower, but it works for me.

    6. Re:Use it by phantomfive · · Score: 3, Insightful

      What do you think about intermediate variables that are not strictly necessary?

      Use them if they make things clearer for someone reading the code, otherwise don't. For example, you can write:

      screen.displayName = user.firstName + user.lastName;

      or you can write

      String fullName = user.FirstName + user.lastName;
      screen.displayName = fullName;


      Thus making it more clear to someone reading that you are trying to use the full name. That is probably not the best example because anyone would probably understand that user.firstName + user.lastName is the full name, but I think you can see the main point, that sometimes it can be easier to read a few meaningfully named intermediate variables than a long equation. If it isn't easier to read, don't do it. But really when I read code, or even write it, I am willing to conform to either way of doing it if someone else feels strongly about it, because that is far less important than things like flexibility of major structures in the code.

      --
      Qxe4
    7. Re:Use it by Anonymous Coward · · Score: 1, Interesting

      The main thing that bothers me when working with other peoples code is the sheer number of variables they use. I tend not to declare a new variable unless it is absolutely necessary (and in object oriented programming variables other than pointers are almost never necessary). It seems like code written this way is easier to read and understand (and significantly smaller). This is slashdot, so there are a lot of other programmers out there. Am I off base here

      No, you're on target. Making a variable to temporarily store a variable amounts to writing unnecessary plumbing. You can abstract that plumbing out very easily, through "functional monadism". This makes things much easier: you can manipulate the plumbing apart from the things it plumbs.

      You are definitely on the right track. If you haven't done this, try learning SQL, and then a functional programming language. You will see that the computation of a function amounts to the computation of a subset of the cartesian product of types (or sets, more generally). Evaluating a function amounts to evaluating a query. It's easiest to write queries against data types in certain "normal forms". This means that a program has three essential components: definitions of the normal forms (the data definition languages for SQL), queries to run against values of these forms, and, as a practical matter, data to query.

    8. Re:Use it by ciggieposeur · · Score: 4, Informative

      What do you think about intermediate variables that are not strictly necessary?

      My general rules of thumb:

      1) I don't care how many variables are declared, so long as each makes sense on its own. Like another poster's example, 'fullName' is perfectly acceptable (especially for i18n/l10n aware code that may have different rules for generating a name).

      2) I ABSOLUTELY HATE clever arithmetic / pointer arithmetic / expressions all crunched into one line that can be split out. Example: in C-like languages that support pre- and post-increment, I expect the code to use only one or the other consistently, and never mix it with another expression. So this is fine:

      i++;
      j = i + 4; ...but this I can't stand:

      j = ++i + 4;

      #2 I picked up from a very experienced developer who pointed out that making the code harder to read is never worth it, the compiler produces the same code as the easy-to-read version. And that making code that looks 'too easy to be clever' is quite a bit harder than making code that looks 'too clever to always work'.

    9. Re:Use it by jimrthy · · Score: 1

      Someone with mod points, please...mod this up!

    10. Re:Use it by scotch · · Score: 1

      Example: in C-like languages that support pre- and post-increment, I expect the code to use only one or the other consistently, and never mix it with another expression.

      Use one or the other consistently? But they don't mean the same thing.

      --
      XML causes global warming.
    11. Re:Use it by bill_mcgonigle · · Score: 1

      Am I off base here? What do you think about intermediate variables that are not strictly necessary?

      Well, use your judgement. Sometimes it helps. Take two options:


      my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
      $mon++; # 0-indexed from localtime
      $year+= 1900; # years since 1900
      print "At the tone the time will be: $year-$mon-$mday $hour:min:$sec\n";

      Or the more 'efficient':


      my @timeparts = localtime(time);
      print 'At the tone the time will be: ' . $timeparts[5]+1900 . '-' . $timeparts[4]++ . "-$timeparts[3] $timeparts[2]:$timeparts[1]:$timeparts[0]\n";

      To the topic, which would you rather encounter as the next man on a codebase? And don't forget that the compiler will optimize out any gratutious intermediates.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    12. Re:Use it by Anonymous Coward · · Score: 0

      I try to follow the "max one page per method" rule. Following that rule you cannot define many variables anyway.

    13. Re:Use it by phantomfive · · Score: 1

      If you use them on a line with no other operators, they are the same.

      --
      Qxe4
    14. Re:Use it by Rexdude · · Score: 1
      Absolutely. I've seen cases (in Java) where people go:

      if(Class1.getSomething().toString().indexOf(Class2.getSomethingElse().toString) == -1) {

      ... This becomes a huge pain when debugging, if there's an error in a line like the above, it's much harder to sit and step through each function call than if they were properly isolated beforehand.

      --
      "..One hosts to look them up, one DNS to find them, and in the darkness BIND them."
    15. Re:Use it by gknoy · · Score: 1

      I find it especially handy when you need to debug (or just report on) or test the parts of an equation. My giant multi-term formula is Failing Somehow ... why? is it bad input? Which sub-term is nil? Am I even implementing the formula correctly? Intermediate named variables really help with this.

    16. Re:Use it by pooly7 · · Score: 1

      The only way to learn the code is to work with it.

      If you're new to a team, best thing you can do to learn a new codebase, is to reply to loads of support questions coming around, you'll have to dig in to understand how it works and reply to the questions. Same when there is a bug, try to find it, then work out a solution and ask your co-woker what he thinks about it. Also, a good test to see if you know your codebase is getting a call from your boss and be able to fix the bug over the phone when he explains it to you.

    17. Re:Use it by selven · · Score: 1

      Those are extremes. Multi-lining can still be useful. For example, a nice simple physics calculation:

      far_enough = ((projectile.speed * sin(projectile.angle) / gravity * 2) * (projectile.speed * cos(projectile.angle)) >= sqrt((destination.X - start.X) ^ 2 + (destination.Y - start.Y) ^ 2)

      Or:

      hspeed = projectile.speed * cos(projectile.angle)
      vspeed = projectile.speed * sin(projectile.angle)
      seconds_in_air = vspeed / gravity * 2
      reqdist = sqrt((destination.X - start.X) ^ 2 + (destination.Y - start.Y) ^ 2)
      far_enough = seconds_in_air * hspeed >= reqdist

      The second one makes a lot more sense, with the intermediate variable names explaining exactly what's going on, and doesn't wrap around the right side of the screen.

    18. Re:Use it by Stevecrox · · Score: 1

      Temporary variables are a good idea if your looking at slow processes. I've been working in Java with XMLBeans and pulling XML data out of an object is a painfully slow process. I've also worked on several JNI interfaces and while pure C++ code is fast and pure java code is fast translating between the two is very slow.

      Storing frequently used objects in temporary variables can make sense, just as temporary wrappers can make sense. Both can lead to a drastic improvement in performance.

    19. Re:Use it by Kjella · · Score: 1

      I strongly agree with you, the grandparent's example is just showing alphabet soup but if I was to do any maintenance on this code like say an air drag factor I'd much rather work on your code than the grandparent's. Some write like the number of variables is an optimization, but I expect the compiler to figure that out and if it doesn't then I still want you to do it only in the innermost loop of a performance-critical section. But only if it makes the code clearer to the developers, not to obfuscate it.

      For the same reason I hate SQL queries that use the pattern:

      SELECT [expression]
      FROM some_table a, other_table b, foo_table c, bar_table d
      WHERE [expression]

      and make the whole rest of the query into alphabet soup where you must do variable substitution in your mind to figure out WTF is going on and there can be half a page of conditions between the FROM statement and the conditions that link table b and d (what, you thought d would link to c?). Yey for:

      a) ANSI joins keeping the conditions near in the ON
      b) Using sensible aliases that are fairly constant in all queries (e.g. resources table = res)
      c) Prefixing fields in the query so it doesn't break if I have to join in another table.

      Oh yeah, and another thing to get back on topic. If you're being very verbose as the code above at least in C++ you should consider scoping it inside a set of braces. That way, someone debugging the function doesn't have to deal with all the temps as they quickly go out of scope. If you're not reusing the formula anywhere else, it's much better than making it a function:

      {
              float hspeed = projectile.speed * cos(projectile.angle)
              float vspeed = projectile.speed * sin(projectile.angle)
              float seconds_in_air = vspeed / gravity * 2
              float reqdist = sqrt((destination.X - start.X) ^ 2 + (destination.Y - start.Y) ^ 2)
              far_enough = seconds_in_air * hspeed >= reqdist
      }

      --
      Live today, because you never know what tomorrow brings
    20. Re:Use it by moonbender · · Score: 1

      I don't know about other debugging GUIs, but Eclipse can easily evaluate an expression without crutches like the immediate window or watches. Mark the expression and press Ctrl-D for a toString or Ctrl-I for an inspection window of the evaluation result. I assume most other modern IDEs can do this.

      --
      Switch back to Slashdot's D1 system.
    21. Re:Use it by scotch · · Score: 1

      Sure, if you don't use them for what they are designed for, they have the same effect. But they still don't mean the same thing.

      --
      XML causes global warming.
    22. Re:Use it by ciggieposeur · · Score: 1

      Use one or the other consistently? But they don't mean the same thing.

      They compile to the same thing if they are both used with the meaning "increment this thing by one". If they are used to mean "increment-then-evaluate" or "evaluate-then-increment", one may as well separate the increment from the evaluate because the combination only (maybe) makes it easier for the lexer/parser, not the next human assigned to maintain the code X years later.

      I'm fine with "i += 1" or even "i = i + 1". If I want to increment, I increment; if I want to evaluate, I evaluate.

    23. Re:Use it by scotch · · Score: 1

      Use one or the other consistently? But they don't mean the same thing.

      They compile to the same thing if they are both used with the meaning "increment this thing by one". If they are used to mean "increment-then-evaluate" or "evaluate-then-increment", one may as well separate the increment from the evaluate because the combination only (maybe) makes it easier for the lexer/parser, not the next human assigned to maintain the code X years later.

      I'm fine with "i += 1" or even "i = i + 1". If I want to increment, I increment; if I want to evaluate, I evaluate.

      Easier for the parser/lexer? Really? No, pre-increment and post-increment are not to make the parser's job easier, they are to make code more concise, easier to read, allow concise and common idioms for humans.

      Here's an example extending an array foo:
      foo[len++] = "I";
      foo[len++] = "like";
      foo[len++] = "big";
      foo[len++] = "butts";

      Can they be abused? Sure. Just like everything. If you don't grok that, go back to python.

      --
      XML causes global warming.
    24. Re:Use it by Anonymous Coward · · Score: 0

      This is a good point. But the strategy you use for "caching" is really dependent on your language's evaluation model. In Haskell's monadic do-notation, I would use "let" notation to save a value for the scope of the computation, just as you would use a temporary variable. If I was defining a non-monadic function, I would use a where clause to define a value to use in a computation (reusably, in its scope). Or you could even cache the function's value at point, with a higher order function. Some of these are more disruptive to "normal-formedness" than others.

    25. Re:Use it by Jack9 · · Score: 1

      // The display name should be the fullname
      screen.displayName = user.firstName + user.lastName;

      Later, if the fullname becomes user.firstName + user.middleName + user.lastName, and there is no unit test that fails, there should at least be a comment.

      If you want to communicate context, comments and/or unit tests are how to do it. If you want to ask the question "should I?" then someone else reading the code will ask "why didn't they?" or "what were they thinking?", so comment comment comment. In the end, you want someone reading the code to say "they just didn't do X, they did Y" and the reasoning will be clear that they original author either coded it fundamentally wrong from the beginning (unlikely) or that requirements/dependencies have changed.

      --

      Often wrong but never in doubt.
      I am Jack9.
      Everyone knows me.
    26. Re:Use it by ciggieposeur · · Score: 1

      Congratulations on finding one of the few idioms where evaluate+increment in one line can make the code easier to read than inserting a bunch of "i++"'s between lines. Well, if it's C, and the array is already initialized to be big enough; or if it's C++, and operator[]() automatically re-sizes the array as needed. OTOH, if it was Java or D you'd do better with the appropriate collection class.

      Can they be abused? Sure. Just like everything.

      True, any language feature can go bad in the wrong hands. Go read some old C code that tries to be clever. Try the public domain version of rz/sz for instance; it's not much better than assembly language. It's so bad in fact that multiple zmodem implementations were made later (sexyz, qterm) rather than try to reuse it.

      If you are being paid to write code for someone else to maintain, you have to draw the line somewhere between the available language features and the likely skills of the next person in line to fix it.

      If you don't grok that, go back to python.

      Clojure, actually, where I'd just a write [ "I" "like" "big" "butts" ].

    27. Re:Use it by tgrigsby · · Score: 1

      Depending on the debugger, it might be advantageous to split it out verbosely at first, the consolidate the code once the range of values coming in are understood. I would write it in short form first, break it out if things weren't behaving as expected, then smoosh it back together once I had it figured out.

      --
      *** *** You're just jealous 'cause the voices talk to me... ***
    28. Re:Use it by hitchhacker · · Score: 1

      What do you think about intermediate variables that are not strictly necessary?

      I'll often find myself coding some physics equations from specifications written on paper. Obviously, they are always written in math notations. What I end up doing, if not limited by cpu/ram, is to create a stack variable for each term in the equations. Basically, I'll try to make the code look as much like the paper specs as possible. The specs will ALWAYS change, and trying to figure out how the two relate some years later is a real pita. Also, I'll always preface everything with some comment like "The following is from foobar specs dated Jan 1st 2002" for the reverse reasons.

      -metric

    29. Re:Use it by sincewhen · · Score: 1

      And (relevant to all of these code examples) a well crafted comment or two taking only a few seconds to write would more than double the readability of the code.

      --
      -- Braden's law of data: All data spends some of its lifetime in an excel spreadsheet.
    30. Re:Use it by Anonymous Coward · · Score: 0

      So this is fine:

      i++;
      j = i + 4; ...but this I can't stand:

      j = ++i + 4;

      There's a subtlety there in that, in the latter case it is explicit that it means for i to be 1 more than it previously was, and for j to be 4 more than i's new value; whereas in the former case it is entirely plausible that at some stage code might be added in between the two lines, and the relationship between the increases in i and j are no longer together and obviously related.

    31. Re:Use it by ciggieposeur · · Score: 1

      If that kind of subtlety is important, it can be documented via comment, encapsulated separately via (potentially inline) function, or it can be re-factored so that whatever common dependency i and j have is made explicit. All of which can lead to the same machine code by the optimizer.

  17. 30-40kloc is not large by aachrisg · · Score: 1

    I wouldn't try too hard with a codebase as small as 30-40k lines, but for an actually large codebase, there are a bunch of different things that can help: - examine a class or function hierarchy and call graph. If you have tools to do so and the codebase is set up for it, go ahead. If not, set up the tools and codebase to be processed for this - you'll learn stuff about the code just by hooking these tools up. - pick medium-level routines in the code base that you are interested and run the applicaiton in the debugger with breakpoints set on them. Take a look at the callstacks, step through the callers, look at the arguments, etc. - you can also get a bunch of knowlege of the structure of the app by single stepping in the debugger - "step over" to see the high level control flow, and "step into" subsystems you want to explore. - documenting the existing code using a tool such as doxygen can help you learn it while at the same time providing useful documentation for other team members.

  18. Trace sessions and time by oldhack · · Score: 4, Insightful

    I'll echo some earlier comments.

    Set up an execution environment with debugger, and run several typical scenarios and trace them with debugger. Get the feel of the big-picture execution scenarios/paths.

    It will take time for your brains to get comfortable with it, though. And the details, when you look into them, will throw odd stuff at you. But that's the nature of our work.

    --
    Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
    1. Re:Trace sessions and time by Anonymous Coward · · Score: 0

      And Linus Thorvald will tell you that if you need step through the code with a debugger, your BS in CS are not worth its ink.

      Sorry, the original suggestion is a good one. I am just complaining how short-sighted is Linus for not adding kernel debugging inside the Linux kernel.

    2. Re:Trace sessions and time by radtea · · Score: 1

      Set up an execution environment with debugger

      I once worked with a fairly large legacy system that had been ported to Solaris from its original development system, which was a PDP-11 running RT-11/TSX, which as those of us who learned on it knew, had and addressing limit of 64k code and 64 k data in a single "overlay". TSX was a multi-tasking layer on top of RT-11, which meant that one way to get around this limit was to run multiple processes and have them communicate via pipes, and this fundamental architecture had been maintained in the port.

      This meant the most trivial path of execution involved multiple fork/execs and what amounted to asynchronous processing via message passing. It was kind of elegant, in its own twisted way, but it meant that by far the easiest way of coming to grips with it was to put printf's everywhere (it was written in C) and have every program generate time-stamped output into its own file, then run a script to interleave the output so I could see a synchronous picture of what was going on.

      After that I could attached gdb to various spawned processes and dig into them in a bit more detail, but that was a hopelessly laborious procedure for getting the overview, and because the pipe code was blind (the pipe endpoints were reused for whatever pipe happened to be open at the moment) it was almost impossible at any point to know what process you were talking to at a given point unless you'd seen it created: there was essentially no way that any amount of static analysis could have revealed the underlying structure.

      The company wanted me to change the code to add functionality that would have required touching virtually every file, and I was eventually able to show via various software engineering metrics that doing so would take approximately three years, mostly due to bug fixes (the cyclomatic complexity of some of the routines was into the hundreds, even not accounting for the bizarre architecture.) They killed the project and we wrote something simpler and cleaner to do the same job from scratch. That's rarely the best solution, but keep it in mind if things get too out of hand.

      --
      Blasphemy is a human right. Blasphemophobia kills.
    3. Re:Trace sessions and time by oldhack · · Score: 1

      I agree with you: as tempting as it is, rewrite is rarely the optimal solution, but sometimes, that's just what's called for.

      Making the case, though, is another thing altogether. At the least you have to understand the current code base, which can be a substantial work all by itself. And then make a case in a form that carries weight for the audience (i.e., management). Here, I think many of us end up hustling because software engineering still is not a mature formalized discipline yet.

      --
      Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
  19. Tried and True by cosm · · Score: 2, Insightful

    For culinary folks...
    The time and money you spend tracing and inserting noodles in the spaghetti will end up being larger than the time it takes to cook a new batch (no pun intended).

    For auto folks...
    The time and money you spend bondo-ing, welding, rewiring, duct-taping, and C'n'Cing parts for the car will end up being larger than the time it takes to design and build a new car. (Although restoring an old/vintage car for the sake of nostalgia is a much more pleasing experience than buying a new one).

    Gain an understanding of the purpose of each pivotal region. Know what your desired result should be, then begin the rewriting endeavor.

    --
    'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
    1. Re:Tried and True by Anonymous Coward · · Score: 0

      And break a bunch of stuff that you didn't realize was there along the way, causing your colleagues to have to wait for you to fix your broken "new shiny code cuz I'm not able to understand the old". Schedules slip because now part of or the whole team is blocked. And unless the module in question is very simple (which begs the question why can't you understand it in the first place) it will happen, maybe not every time, but most of the time.

      You rewrite something because it really is fundamentally broken, or can't be reasonably extended in it's current form to meet new goals, not because you haven't learned how to read other peoples code or "get" their designs.

      *adding this to my list of possible interview questions*

    2. Re:Tried and True by Piquan · · Score: 3, Insightful

      These projects invariably have lots of tiny gotchas that you're going to steamroll in your effort to rewrite it. See Joel on Software on this.

    3. Re:Tried and True by rocker_wannabe · · Score: 1

      For you construction folks...
      The time and money it takes to add another outlet to that wall will end up being larger than just starting over with a new house....
      ....Ahhhhh, maybe NOT!

      Rewriting something that cost tens or hundreds of thousands of dollars because it's hard to maintain would look wasteful and ridiculous to anyone that didn't work with software but it happens all the time. This is what made me realize that programming is still an immature field. I keep a copy of Edsger Dykstra's (or Dijkstra) lecture to University of Texas computer science students in 1988 on my desktop and read it every time I feel the urge to work for a software company again. His premise is that if you teach people how to write code before you teach them the discrete math and logic framework that applies to programming you cripple them. By not giving them the tools to write formal specifications that anyone with training could read and comprehend fairly quickly, you end with the industry situation we have now. The next programmer will either get a textual description of the functionality that is incomplete and not detailed enough or just get a copy of the source code. Since the code mixes WHAT is supposed to be done with HOW it is supposed to be done it can be incredibly difficult to follow.

      This means that the poor smuck that has to maintain someone else's code has to spend a huge amount of time reading and/or stepping through the code to try and understand how it works before any actual "code maintenance" can start. It's ALWAYS possible to reverse engineer code and start drawing diagrams or commenting the code to help understand how it works. Unfortunately, many programmers think they can do better and push hard for a refactoring of the code. This is usually pure folly since the pressure to get something working will be even greater because there is already code available that mostly works so the odds of the next iteration getting any better is incredibly low unless the current code is TRULY a pile of excrement.

      Since I've never been in a position to force a company to use formal methods for writing software specifications, and none of them thought they needed it, I've grown weary of watching new programmers thinking they are smarter than the people before them and choose to rewrite code. This usually ends badly. As a System Test engineer for a number of years who has written "black box" and "white box" tests, I can say without a doubt that you can't create quality code through testing, especially in a commercial environment. All you can do is cause the release date to slip until the really awful and obvious bugs are fixed.

      --
      "Meaningless!, Meaningless!" says the Teacher. "Utterly meaningless!"
    4. Re:Tried and True by Anonymous Coward · · Score: 0

      Now that is really poor advice. You better be damn sure the project is salvageable before attempting a rewrite. Otherwise, you can expect the following:

      After you think you understand all of the requirements of your client you'll begin writing code in earnest, only to discover it's going to take way longer than you initially thought. Meanwhile, your client is going to get angry waiting around because to them it doesn't look like any progress is being made. Finally, once you've rewritten 40K lines of code, you have to work the bugs out all over again. In short, that's a lot of wasted time, duplicated effort, and expense for your client.

      Conclusion: if the code works and it serves the clients current needs, then it's worth spending the time to understand it.

    5. Re:Tried and True by Jeremi · · Score: 1

      This means that the poor smuck that has to maintain someone else's code has to spend a huge amount of time reading and/or stepping through the code to try and understand how it works before any actual "code maintenance" can start.

      The above provides an excellent incentive for the company to keep the original programmer, instead of, say, laying him off and outsourcing his job to a team of fresh college graduates in Bombay, even though they are willing to work for less.

      Just sayin'.

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    6. Re:Tried and True by lena_10326 · · Score: 1

      So an individual is going to rewrite a production system composed of 30-40k lines of code all by themselves when the original team had a bigger budget, more teammates, more time, face to face access to clients, and participation in all the requirements gathering sessions?

      Don't underestimate the effort that went into legacy systems. Unless your team resources on par with the original team or you have a magic productivity bullet (perhaps migrating from old school Fortran to Java), don't even try it. Go with a piecemeal series of revisions over time. It can be done (I've done it before) but you must prioritize and cull the list of worst offenders. It requires having solid understanding of the code base and a clear plan for the bits to rework and the bits to keep.

      --
      Camping on quad since 1996.
    7. Re:Tried and True by Anonymous Coward · · Score: 0

      There are times to rewrite and times not to rewrite, and the decision should rarely be made on code quality alone. There are two factors other to consider.

      The first is the time needed to reimplement. The longer this time is, the higher your risk of cancellation. Yes, I know, your manager promised you that they would stick it out until the bitter end. But things change. What usually happens is that there is some big emergency and the old code is resurrected "temporarily" to solve a specific customer issue. These issues keep popping up over and over again until your are spending almost all your time in the old code. Finally, they cancel the new project. Alternatively, the company runs short on cash and moves back to its "core functionality", scrapping anything that isn't absolutely necessary. If your rewrite takes longer than a month or two, I give you only a 10% chance of completion.

      The second factor is requirements gathering. Do you have an organized description of all the functionality of the current program? Are all the bug fixes and subsequent feature additions rectified with the main requirements documents so that the whole thing is up to date and complete? Thought not. In every case I have ever experienced the only place that accurately documents the current functionality of the program is the source code itself. And if you can't understand it, then you probably are going to forget a huge number of small issues when you reimplement. If you manage to get to the release date without cancellation, your project will almost certainly be a disaster when the customers decide they hate the new version. Often, no matter how much better the new code is, management will decide to revert to the old code base.

      There are two times when you should rewrite: if it won't take a long time (4-8 weeks tops) and/or the requirements of the project have changed dramatically to the point where everything has to be rethought anyway. Other than that, in the vast majority of circumstances you will be better off putting the existing code under tests and refactoring as best as you can.

    8. Re:Tried and True by Anonymous Coward · · Score: 0

      Sorry, but what killed Netscape?
      Wasn't that that they wanted to rewrite everything to Java while Microsoft added features instead?

      Rewriting might be a good idea but it all depends on the complexity of the solution and the main reasons.
      Rewriting in a faster language (i.e. compiled instead of parsed), x64 instead of x32 or whatever might be the reason; however I'm not very sure that "not being able to read the other guys' code" is a good reason.

    9. Re:Tried and True by Anonymous Coward · · Score: 0

      You obviously have no clue. At. All.

    10. Re:Tried and True by Anonymous Coward · · Score: 0

      See Joel on Software on this.

      Joel is gay. Any real man can recode any piece of software if he is a real man. Otherwise Linux wouldn't exist. See: http://www.freesoftwaremagazine.com/articles/drivers_linux

  20. Some things I do to figure out code... by CFBMoo1 · · Score: 2, Interesting

    PL/SQL or cobol or whatever they throw at me I poke, prod, and play with it in a test environment. Someone up above mentioned pencil and paper to draw out how everything relates and that is a very good practice I've found to just get to know things. It's not instant but it helps more then you initially think. Also I use Open Office Draw to map out things as well. :P

    --
    ~~ Behold the flying cow with a rail gun! ~~
  21. 2000 lines can be enough by sugarmotor · · Score: 1

    2000 lines can be enough to throw you off!

    I think it is just like learning anything. Keep at it.

    The most important thing is whether you have an efficient way to
    look at what effect any changes have that you may make. Any effort you put into
    that is probably not going to be wasted. (Might be unit tests? Sounds like they did not come with the code)

    Stephan

    --
    http://stephan.sugarmotor.org
    1. Re:2000 lines can be enough by klogg_siebentag · · Score: 0

      2000 lines?! I've inherited codebases of 250k+ LOC (Visual J++), and there were numerous single methods that would dwarf that 2000 lines! I know that claim is sort of the new millenium's version of my grandfathers claim of "walking 64 furlongs to school with only 1 shoe in 16 inches of snow because he couldn't afford the 2 and thripence yearly fee for a bus pass", but being thrown into the deep-end of a 250k line project is not unusual. If you get one with documentation, its like winning a small lottery. If you get one with up-to-date and informative documentation, then, well, do they exist?

      I find that stepping through the code and trying to understand it is psycholigally damaging. You end up just wanting to hurt the person who wrote it. But as everyone else has said, its probably the best idea and the only way to achieve your goal.

      Anyway, that's my 1.274p worth...

  22. Little by Little does the trick by cheesybagel · · Score: 1

    Getting something that allows you to browse code more efficiently certainly helps. There are tools for doing that.

    Another trick is to compile in debug mode, run the code inside a debugger, then break and watch the function call stack. This can help understand deeply nested code some more.

    In the long run however nothing substitutes practice using the codebase. Even an author can get lost if he spends some years away from the code... Either you just do not remember anymore, or the code was changed so much by someone else's edits it gets hard to recognize. Or both.

    If the code does not have consistent coding style standards run it thought a indenting program. You may lose the revision control history but you certainly get a more than reasonable return from it being easier to parse manually. If it does have a consistent coding style standard, even if it is something you are not used to, probably better to keep it that way.

    Cleanup code by refactoring common code blocks out, or doing other code refactoring that reduces line code code and/or increases readability. Make sure the refactored version is functionally equivalent to the non-refactored version. Unless you are fixing a bug. Even if you are fixing a bug document the change just in case something actually relies on bug for bug compatibility.

    If you do not have time to do cleanups just keep adding the functionality you need. Eventually you will have read enough code that you will know the codebase. If you do not need to add any more functionality, who cares anyway?

  23. Unit tests first by Fenris+Ulf · · Score: 1

    Get a copy of Working Effectively With Legacy Code. It'll help you get tests around the code base that give you the confidence to be able to change it without breaking anything.

    1. Re:Unit tests first by ChrisLambrou · · Score: 1

      I concur. Working Effectively with Legacy Code, by Michael C. Feathers, should be considered the definitive guide book for working one's way through exactly the kind of scenario you've described.

  24. Re:30 to 40 thousand lines isn't large by any meas by pclminion · · Score: 1

    One million lines is starting to feel big.

  25. Re:30 to 40 thousand lines isn't large by any meas by etymxris · · Score: 3, Interesting

    I inherited a code base of 1.5 million lines of code at the last job I was at. Thankfully I wasn't the only one responsible for it. My advice to the original poster is to add lots of logging information. Log statements should document what the code is doing at any point in time and tell you where it is doing it. If it's java you can get the stack trace from anywhere--this is very handy for logging.

  26. You lucky bastard by Anonymous Coward · · Score: 0

    30k-40k... I am working on a project with ~2 million lines of code spread across C#, SQL & HTML/Javascript/CSS. Mind you, there are 8 developers working on it, but each one of us has to pretty much know the entire thing.

  27. Large? by VirginMary · · Score: 2, Insightful

    Ha, ha! Just 4 months ago I joined a project with a code base of about 500k lines. I would call that (the 500k lines one) intermediate in size. There are code bases with many millions of lines. I now feel pretty comfortable finding things in it. And I mostly use find and grep.

    --
    When 1person suffers from a delusion,it is called insanity.When many people suffer from a delusion,it is called religion
    1. Re:Large? by snowgirl · · Score: 4, Insightful

      Ha, ha! Just 4 months ago I joined a project with a code base of about 500k lines. I would call that (the 500k lines one) intermediate in size. There are code bases with many millions of lines. I now feel pretty comfortable finding things in it. And I mostly use find and grep.

      At my job at Microsoft, we were in the support end of the core os group. That meant that core os wrote WinXP, Server 2003, Vista, etc, and then it got completely moved over to us to maintain.

      Unfortunately, Windows doesn't really have find and grep, but it does have "dir /s /b [pattern]" and "findstr /sipc:"[pattern]"" Once I learned those, that's a lot of what I used to find the code that I needed to fix.

      All I can say is that it takes time, and effort to become familiar... and you're just stuck with it.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    2. Re:Large? by Anonymous Coward · · Score: 0, Insightful

      Are you Microsofties really so stupid and ignorant that you're not aware of the ports of GNU utilities to Windows or Cygwin or even your own company's Interix and Services for UNIX products?

    3. Re:Large? by Tawnos · · Score: 1

      If you're here, then you should know that \\shindex\search has a fully indexed codebase for all branches.

      As for getting acquainted with the code - find places that need improvement, learn them, learn how they interact with their immediate dependencies and neighbors, continue up and out. 30-40k lines is tiny in the grand scheme of code.

    4. Re:Large? by Anonymous Coward · · Score: 0

      > Windows doesn't really have find and grep

      Um... cygwin?

    5. Re:Large? by Anonymous Coward · · Score: 0

      Why are you divulging Microsoft's proprietary secrets? What is your employee ID?

    6. Re:Large? by maxume · · Score: 1
      --
      Nerd rage is the funniest rage.
    7. Re:Large? by Anonymous Coward · · Score: 0

      gnutools has a windows bin install- one of the first things i install and put in my path-- diff & grep!

    8. Re:Large? by snowgirl · · Score: 5, Interesting

      Are you Microsofties really so stupid and ignorant that you're not aware of the ports of GNU utilities to Windows or Cygwin or even your own company's Interix and Services for UNIX products?

      No, but to explain this, I need to give you some background.

      When I joined Microsoft, I hadn't used any version of Windows at all for any reason other than playing games. After joining Microsoft, I never used Windows at home for any purpose other than logging into the VPN to work from home... and since I did not even have an x86 machine, this required using Virtual PC on my Mac OSX box.

      Now, I know of all of these tools, and I even could install GVim on the machine as well. However, I was working in a Build Group. This required me to occasionally log into 100 different machines at once in order to start the build process for WinXP/Server 2003. Most of these machines require no more input than logging in and starting up a single app... thus no reason to install special software on them.

      Then, something would break, and I would have to read logs, and/or code on the actual box that had the exact problem. Spending an hour installing apps to do my job would be an unacceptable use of my time, and delay the build unnecessarily.

      I learned to use the tools that were available with the environment that I was in. Thus, I did almost all of my programming at Microsoft in notepad.exe, and I'm not kidding you.

      Were I in a different group? The results could have been different... but having 100 different machines, most of which I didn't have admin rights to, meant that even just installing Notepad++ or something like that would have been a waste of time.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    9. Re:Large? by snowgirl · · Score: 1

      > Windows doesn't really have find and grep

      Um... cygwin?

      Ok, again, this time with special emphasis for the retarded... WINDOWS ITSELF does not have find and grep.

      Any GNU OS will, GNU/Linux and GNU/Hurd included, as does any BSD OS.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    10. Re:Large? by snowgirl · · Score: 1

      If you're here, then you should know that \\shindex\search has a fully indexed codebase for all branches.

      Oh, I knew about shindex... there was also an internal webpage that one could use to search all the codebases as well.

      I however didn't have to deal with all the codebases, I had to deal with one and only one at a time in general, and typically the code was checked in last night, because if it were checked in the night before, it would have broken the build that previous night.

      Actually, Product Studio provided tons of information (better than any code indexing service that was available) about what just changed, and helped out enormously.

      I don't argue that had I been in a different group, that I would have had different tools at my finger tips, and many of them could have worked better... but I was stuck with what I had.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    11. Re:Large? by snowgirl · · Score: 1

      Why are you divulging Microsoft's proprietary secrets? What is your employee ID?

      \\shindex\search isn't really a Microsoft proprietary secret... it's more just corporate culture... like talking about Blue Badges vs. Orange Badges. Outside of Microsofties, you're likely to get a bunch of "Huh?" But it doesn't divulge anything about Microsoft business practices.

      As for me divulging information about MS, I haven#t worked there in about two years, and I don't think that there is any duty of care to ensure that I don't share any trade secrets... any of them that I have are old, and most likely outside of the "trade secret" protections.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    12. Re:Large? by Anonymous Coward · · Score: 1, Insightful

      What the hell? Are you serious?

      So Microsoft themselves hired you to work on Windows, although you were a Mac user and had absolutely no real experience with Windows?

      Not only that, but you had to manually log in to hundreds of systems just to run a script? They didn't push for this to be automated, and you tossed back on the street where you belong? What the hell?

      Don't get me wrong, I don't doubt that your story is true. It's the sort of shit that we should expect from any large company, especially Microsoft. Please tell me you're an H1B, though. At least then it'd make some sense why they'd hire you. H1Bs typically aren't worth more than a batch file.

    13. Re:Large? by snowgirl · · Score: 5, Interesting

      What the hell? Are you serious?

      So Microsoft themselves hired you to work on Windows, although you were a Mac user and had absolutely no real experience with Windows?

      Not only that, but you had to manually log in to hundreds of systems just to run a script? They didn't push for this to be automated, and you tossed back on the street where you belong? What the hell?

      Don't get me wrong, I don't doubt that your story is true. It's the sort of shit that we should expect from any large company, especially Microsoft. Please tell me you're an H1B, though. At least then it'd make some sense why they'd hire you. H1Bs typically aren't worth more than a batch file.

      Yeah, it took me about a month before I understood that my entire group would be replaced by a few scripts in the Open Source world.

      The primary problem was that because the source code was not a "product", the build code was so full of holes and edge-cases and hacks, that it broke almost constantly, and required someone to babysit it for the whole 14-some hours that it takes to compile.

      Actually, in my orientation class, we went over patents, copyright, and trademark, and I knew it all, and the teacher asked me how I knew so much, and I told her that I owned a registered copyright on some GPL code, and she was like, "and your managers hired you knowing that?" And I was like, know about it? It's the only reason I got hired by Microsoft... be damn sure I didn't submit a resumé.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    14. Re:Large? by mysidia · · Score: 1

      Most of these machines require no more input than logging in and starting up a single app... thus no reason to install special software on them.

      Then, something would break, and I would have to read logs, and/or code on the actual box that had the exact problem. Spending an hour installing apps to do my job would be an unacceptable use of my time, and delay the build unnecessarily.

      "Then something would break" contradicts the earlier statement "no more input than logging in"

      The fact that something is likely to break, and you will need to troubleshoot it, should be reason enough in itself to install some (small) convenient, unobtrusive troubleshooting tools, as standard practice, and as part of the standard initial installs for those servers, to make troubleshooting faster and not require software installations or elaborate practices when things do break.

    15. Re:Large? by timmarhy · · Score: 1

      are you for real. google win grep, or are you going to tell me windows doesn't really have google either?

      --
      If you mod me down, I will become more powerful than you can imagine....
    16. Re:Large? by Mr.+Spontaneous · · Score: 1, Insightful

      Mentioning that you work/have worked for Microsoft on Slashdot is one of the quickest ways to a flaming.

      --
      Its all fun and games until someone loses an eye... then its just fun.
    17. Re:Large? by mysidia · · Score: 1

      That's funny.

      He's probably Employee ID #1. }:>

      I don't believe it's actually possible for "we index our codebase to make it searchable" to be a proprietary secret, anyways.

    18. Re:Large? by snowgirl · · Score: 1

      are you for real. google win grep, or are you going to tell me windows doesn't really have google either?

      Well, they actually had an internal webpage that would do Google and MSN search (at the time) at the same time and allow you to rate how well MSN search did compared to Google.

      But why install grep, when findstr has all the same functionality? Just because I'm familiar with it?

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    19. Re:Large? by snowgirl · · Score: 1

      gnutools has a windows bin install- one of the first things i install and put in my path-- diff & grep!

      They had windiff, which displayed the diff as color-coated lines where changes were at. (Perhaps xdiff does the same?)

      We wrote all our scripts in Perl... that open source enough for you?

      MS didn't not have these *nix tools because they weren't good, it's because they had their own tools.

      And installing them on 100s of computers that I didn't have admin access to? Kind of unreasonable...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    20. Re:Large? by snowgirl · · Score: 3, Informative

      Most of these machines require no more input than logging in and starting up a single app... thus no reason to install special software on them.

      Then, something would break, and I would have to read logs, and/or code on the actual box that had the exact problem. Spending an hour installing apps to do my job would be an unacceptable use of my time, and delay the build unnecessarily.

      "Then something would break" contradicts the earlier statement "no more input than logging in"

      The fact that something is likely to break, and you will need to troubleshoot it, should be reason enough in itself to install some (small) convenient, unobtrusive troubleshooting tools, as standard practice, and as part of the standard initial installs for those servers, to make troubleshooting faster and not require software installations or elaborate practices when things do break.

      You missed a part before the quote that you pulled out. "Most of the machines required no more input".

      My statements remains consistent and not contradictory when only 2 machines typically need direct interfacing.

      And small convenient, unobtrusive troubleshooting tools WERE installed as standard practice on the machines... I already said that there was dir /s /b, and findstr... do I have to have "find" and "grep" when I had tools with the same functionality?

      When I started off, there was a big learning curve because of the new tools, but by the time I left, it was as second nature to me as was find and grep when I joined.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    21. Re:Large? by benjamindees · · Score: 3, Informative

      At my job at Microsoft, we were in the support end of the core os group.

      Windows doesn't really have find and grep, but it does have "dir /s /b [pattern]" and "findstr /sipc:"[pattern]""

      When I joined Microsoft, I hadn't used any version of Windows at all for any reason other than playing games.

      I did almost all of my programming at Microsoft in notepad.exe

      it took me about a month before I understood that my entire group would be replaced by a few scripts in the Open Source world.

      Dear lord, this is the most hilarious thing ever posted to /.

      --
      "I assumed blithely that there were no elves out there in the darkness"
    22. Re:Large? by Matheus · · Score: 1

      It's called Cygwin. It is one of the first things I install on any windows machine I have to develop on (and many that I don't). Off the top of my head it looks something like this:

      find ./ -name "*.c" -print | grep -v "\.svn" | xargs grep -in "the string im searching for"

      With everything but the search string conveniently stored away in a script. I first pieced that together to gain familiarity with a ~200K line project and have used it ever since on much larger ones.

    23. Re:Large? by Anonymous Coward · · Score: 0

      Which open source project(s) did you contribute code to?

    24. Re:Large? by Anonymous Coward · · Score: 0

      And setting policy like that is exactly the kind of thing a build engineer does, because they're usually top dog within the dev group and have lots of political pull, right? Especially the New Person On The Team who's smart but kind of an outsider. They'll totally be able to say "we should put off fixing these build issues for a day so we can streamline our processes, it'll be better in the long run," and everyone will say, "ah, yes, we totally understand. Take your time."

      Oh wait! That was a totally idiotic statement.

      GP, whoever you are, you have my sympathy at least. Build engineers are all too often the unsung heroes of getting shit done when the project is on fire.

    25. Re:Large? by timmarhy · · Score: 1

      because wingrep has a gui that makes your life easier then crappy commandline findstr? command line under windows is fail at the best of times, i couldn't imagine having to work with it (oh wait i do now and it SUCKS)

      --
      If you mod me down, I will become more powerful than you can imagine....
    26. Re:Large? by snowgirl · · Score: 1

      because wingrep has a gui that makes your life easier then crappy commandline findstr? command line under windows is fail at the best of times, i couldn't imagine having to work with it (oh wait i do now and it SUCKS)

      ... this fails the argument that I've been having because people are saying to use grep, which is a commandline as well.

      GUI vs CLI is an entirely different matter on this issue, I did use WinDiff which is a GUI front-end to diff, mostly because it was already there... but it was also very convenient tool to use, and I liked it. I would appreciate anyone pointing me to an equivalent tool in *nix space.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    27. Re:Large? by StuartHankins · · Score: 2, Informative

      Sysinternals has a great tool you can use to automate installs / run software on multiple machines at once, called psexec. Depends on whether you need to run them interactively, in which case you'd have to also script a login. In the future maybe that's a workable solution for you, especially if you have to use large numbers of computers running Windows. Without grep, head, tail, less, etc I'd feel a bit frustrated. Of course if you're discouraged from installing something that's another issue as well. If nothing else there's always group policy. YMMV.

    28. Re:Large? by snowgirl · · Score: 1

      Sysinternals has a great tool you can use to automate installs / run software on multiple machines at once, called psexec. Depends on whether you need to run them interactively, in which case you'd have to also script a login. In the future maybe that's a workable solution for you, especially if you have to use large numbers of computers running Windows. Without grep, head, tail, less, etc I'd feel a bit frustrated. Of course if you're discouraged from installing something that's another issue as well. If nothing else there's always group policy. YMMV.

      Well, I've already left Microsoft anyways. But the worth of installing the software on the various computers just doesn't match up when I have all the functionality already there.

      findstr for grep, head for head, tail for tail, and for less, most of the time I just redirected to a file so I could read it in notepad anyways (lots easier that way sometimes).

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    29. Re:Large? by Anonymous Coward · · Score: 1, Informative

      Explains a lot about MS products.

    30. Re:Large? by Anonymous Coward · · Score: 1, Informative

      I used to work in a similar environment in a university. Tons of windows machines, that I didn't have admin access to. I just carried a usb with me with all sorts of tools that didn't require any more access than a user would have. Seriously borland made a grep for dos that was 7 k back in the 90's. It doesn't sound like you were very creative, but your story does illustrate why the lack of decent command line tools *by default* sucks.

    31. Re:Large? by snowgirl · · Score: 2, Interesting

      I used to work in a similar environment in a university. Tons of windows machines, that I didn't have admin access to. I just carried a usb with me with all sorts of tools that didn't require any more access than a user would have. Seriously borland made a grep for dos that was 7 k back in the 90's. It doesn't sound like you were very creative, but your story does illustrate why the lack of decent command line tools *by default* sucks.

      I didn't even have physical access to the machines. We just RDPed into them, and I had to be logged into every machine at the same time.

      While I had a DFS share that had some of my own tools in it, the problem with running GVim or such off of that is just one of convenience... there were already decent command-line tools available... findstr really does cover everything that I've ever tried to do with grep...

      So, the effort of going out of my way to jury rig all this stuff together wasn't any better than just using the tools that were present.

      We don't really NEED grep... We just need a tool that works LIKE grep.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    32. Re:Large? by Anonymous Coward · · Score: 0

      kompare k3diff meld pick one

    33. Re:Large? by snowgirl · · Score: 1

      It's called Cygwin. It is one of the first things I install on any windows machine I have to develop on (and many that I don't). Off the top of my head it looks something like this:

      find ./ -name "*.c" -print | grep -v "\.svn" | xargs grep -in "the string im searching for"

      With everything but the search string conveniently stored away in a script. I first pieced that together to gain familiarity with a ~200K line project and have used it ever since on much larger ones.

      Off the top of my head...

      findstr /sipc:"the string I'm searching for" `dir /s /b *.c`

      Mine is shorter...

      grep and find aren't the only tools out there...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    34. Re:Large? by snowgirl · · Score: 1

      kompare k3diff meld pick one

      Do you have one that doesn't use KDE? I'm sorry, but if I wanted to run bloatware, I'd use Windows...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    35. Re:Large? by Kalriath · · Score: 1

      Uh, no. In most sane organisations, installing random software on the BUILD MACHINES is considered a giant no-no. The build machines should be as untainted as possible by anything except for the tools required to fetch the codebase and start the compile.

      --
      For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
    36. Re:Large? by Anonymous Coward · · Score: 0

      Why are you being an idiot? Where do you live?

    37. Re:Large? by Anonymous Coward · · Score: 0

      Rated Informative? Ha, you couldn't recognize sarcasm if it bit you on the face.

    38. Re:Large? by ac666 · · Score: 2, Insightful

      BTW - you've been remarkably evenhanded in responding to some pretty snarky, socially-challenged comments. Good on you.

    39. Re:Large? by Anonymous Coward · · Score: 0

      Please tell me you're an H1B, though. At least then it'd make some sense why they'd hire you. H1Bs typically aren't worth more than a batch file.

      ignorant opinionated pig.

    40. Re:Large? by pak9rabid · · Score: 1

      Where's the "-1 Dick" mod when you need one..

    41. Re:Large? by Anonymous Coward · · Score: 0

      Your missing the 'grep -v "\.svn"' part. And given the maximum command line length, it might break completely in a large code base.

    42. Re:Large? by TimSSG · · Score: 1

      I use AstroGrep on Windows http://astrogrep.sourceforge.net/.
      You only need right to create the folder and copy the exe into that folder no installation necessary.

      Tim S.

    43. Re:Large? by pacificleo · · Score: 0

      why you are feeding the troll . that too on Valentine Eve ?

      --
      somethings are best left unsaid , I am one of those things
    44. Re:Large? by Anonymous Coward · · Score: 0

      My post wasn't sarcastic. The non-open-source tools available on Windows are pathetic. I've been in similar situations, and my boss would have had to pry cygwin from my cold, dead hands. It's hilarious to think that anyone would use Windows to maintain any type of complex code base.

    45. Re:Large? by mikestew · · Score: 1

      Are you Microsofties really so stupid and ignorant

      They're not so stupid as to install random utilities on build machines just because they don't like typing "dir /s /b". "Ignorant" is thinking you just walk into an existing build environment of that size, start downloading crap off the Internet, and everyone will be fine with having to support your pet utilities.

    46. Re:Large? by Anonymous Coward · · Score: 0

      Notepad? Jesus.

    47. Re:Large? by ScrewMaster · · Score: 1

      and you tossed back on the street where you belong? What the hell?

      Hey ... play nice.

      --
      The higher the technology, the sharper that two-edged sword.
    48. Re:Large? by plantman-the-womb-st · · Score: 1

      gtkdiff works pretty well for me.

      --
      Say bad words about my book, in cold oatmeal, or I shall sue!
    49. Re:Large? by Anonymous Coward · · Score: 0

      You don't have to run KDE to run KDE apps. You just need Qt, plus whatever dependencies the application happens to require. Also, KDE 3.x is at least as 'light' as XP (despite having more and better features as standard).

    50. Re:Large? by Anonymous Coward · · Score: 0

      meld is gnome and best from what I have used

    51. Re:Large? by snowgirl · · Score: 1

      Your missing the 'grep -v "\.svn"' part. And given the maximum command line length, it might break completely in a large code base.

      The "svn" isn't required because we're not hosting it on a subversion depot.

      Yours as well is using xargs to build a command line... so yours could break with a very large code base as well.

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    52. Re:Large? by snowgirl · · Score: 1

      You don't have to run KDE to run KDE apps. You just need Qt, plus whatever dependencies the application happens to require. Also, KDE 3.x is at least as 'light' as XP (despite having more and better features as standard).

      I reiterate, "If I wanted to use bloatware, I'd run Windows XP..."

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  28. Re:30 to 40 thousand lines isn't large by any meas by ravenspear · · Score: 1

    unless they used a God class for everything.

  29. Read the source! by Deflatamouse! · · Score: 1

    Seriously... if there is a lack of documentation, then you just have to start reading the source code, starting at main(). Then look at each object and read its constructors.

    And start documenting it. Add comments in the code, create inheritance diagrams and sequence diagrams.

    It will be tedious but you will come out of it a better programmer.

  30. *gasp* by Anonymous Coward · · Score: 0

    You mean they didn't comment all their code? *gasp*

  31. Re:30 to 40 thousand lines isn't large by any meas by istartedi · · Score: 5, Funny

    Very well, sir. Here's your 40,000 lines of Perl from the late 90s. It's mostly regex to parse revisions 30 through 451 of our in-house provisioning system. Oh, and BTW don't screw up like the last guy who had this job. He provisioned 32767 customers with tier-1 service, and it was the director's job to explain why we either had to let them have it for the remainder of the year, or else deal with the CR issues.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  32. You don't. You find out what the software did by Colin+Smith · · Score: 4, Funny

    And then you re-implement it in the latest language.

     

    --
    Deleted
    1. Re:You don't. You find out what the software did by mikelieman · · Score: 1

      Good luck with that. There a business rules implemented by people who aren't there anymore for people who aren't there anymore. And it's all tied to whether $variable_1 is an "A" or "B" and $variable_2 being 999.

      --
      Technology -- No Place For Wimps! Grateful Dead and Jerry Garcia Chatroom -- http://www.wemissjerry.org
    2. Re:You don't. You find out what the software did by maxwells+daemon · · Score: 1

      you are nearly absolutely right. If you can easily mod it, mod it. If not, go your route. Programmers and the software they create are a complex adaptive system. The newbie will eventually understand it better than those that left but because of the changes the newbies have made and the failure of memory of those that have left.

    3. Re:You don't. You find out what the software did by Anonymous Coward · · Score: 0

      And make sure you have something for those werewolves. There always seem to be another werewolf lurking even among the best of legacy codes.

    4. Re:You don't. You find out what the software did by timothyb89 · · Score: 1

      You joke, but sometimes complete rewrites aren't such a bad thing. I recently rediscovered a game that I had enjoyed playing with a few years back that was basically dead. As a short summer project I spent some time rewriting it in my language of choice, doing some major code refactoring as well (C to OO can be an interesting conversion!). In the end I came away with an awesome understanding of the original code (it's OSS)

    5. Re:You don't. You find out what the software did by Colin+Smith · · Score: 1

      If the existing business people don't know why the rules are there, surely that means you can get rid of them. You only need the business rules which the existing business people want or need.

       

      --
      Deleted
    6. Re:You don't. You find out what the software did by moonbender · · Score: 1

      Exactly! If they don't know the rules exist, they certainly don't know they were implemented before! You can charge those fools for re-implementing them once things start to break down!

      --
      Switch back to Slashdot's D1 system.
    7. Re:You don't. You find out what the software did by spongman · · Score: 1

      the problem is that there's some guy in another department that relies on that rule, but probably doesn't even know it exists - he's not an engineer, he's a user. but your rewrite goes into production and the company gets sued because that guy makes some mistake because you missed some behavior he depends on but didn't even know existed... rinse, repeat for every crazy 'WTF?' piece of code you removed.

      well, you get my point.

  33. Hope your management understands by syntap · · Score: 3, Insightful

    I have inherited projects and do my best to convince management that a pause is needed to document the code. Personally I try to flowchart the functionality and cover a couple of office walls with Visio printouts. Later on I can use such work to add detail and further documentation.

    I inherited some code where the developer used names of girlfriends in variable names, it was just dumb and completely unprofessional. I didn't worry so much about keeping track of those, I was more worried about a change in one spot having unintended (and perhaps unknown until too late) consequences. Rather than spend time fixing problems, I thought it best to do some up-front documenting to at least provide a path to successful maintenance.

    When I left the project, the manager had a binder of documentation and almost cried.

    1. Re:Hope your management understands by Jane+Q.+Public · · Score: 1

      I inherited a Web site that was not only done in a goofy manner, nothing was documented at all. The customer didn't know who the host was, what the passwords were, and so on and so on. Nothing.

      My philosophy was that since the customer is footing the bill, nothing should be secret. I spent a bit of time hunting down host, account info, domain name info, contact info, etc, etc... writing it all down in an organized format, and gave it to the customer, rather proud of myself for being professional when the prior programmers had not. Result? Complaint that I had spent time without actually making any changes to the site yet.

    2. Re:Hope your management understands by Ixitar · · Score: 1

      I inherited some code where the developer used names of girlfriends in variable names, it was just dumb and completely unprofessional.

      That idiot is still around? I had a brief dealing with him in the late 80's.

    3. Re:Hope your management understands by greg1104 · · Score: 2, Funny

      I inherited some code where the developer used names of girlfriends in variable names, it was just dumb and completely unprofessional.

      I once inherited a coding project where the naming conventions involved anti-depressant, anti-anxiety, and sleeping drugs. Let me tell you, that's a fun preview of how one's future working on the project might turn out.

  34. Try to learn the structure by phantomfive · · Score: 5, Insightful

    I had an English professor who always said, "Structure is the key to understanding." He was talking about literature, but I think the same is true for programs as well.

    Try to understand the structure of the program. What is the basic flow? It should have an initialization routine, a main loop, and a shutdown routine. Find out roughly where they are, then focus on the main loop. Usually there will be one piece of code that is central, and it will occasionally pass control into other large pieces of the program. Sometimes there will be more than one main loop, and control switches back and forth between the various main loops. If the program is event drive, this will make a difference in the structure.

    If you are just trying to make a small change, try to find the sequence of events that will lead up to where that change needs to be made. Follow the sequence of execution until you get to the line you need to change. If you are changing a single variable, sometimes it's helpful to do a search and find all the places that variable is used, to make sure your change won't have any side effects. This may seem time consuming, but it can save 10 times more in debugging.

    Learn to follow code execution with your eyes, without running a debugger. One thing that separates good coders from not so good coders is the ability to follow code that isn't being executed.

    --
    Qxe4
    1. Re:Try to learn the structure by Trepidity · · Score: 2, Interesting

      Depending on the language and domain, one way to speed up learning the structure can be to see if you can match it to some set of programming idioms, and then read up on those idioms if it's not a style of programming you're familiar with. For example, if it's C++, can you figure out by looking at the code's layout whether it was written by someone big into C++ design patterns? If so, it might be easier to reverse-engineer what it's doing if you read a C++ design-patterns book, and then match large segments of the code to "oh it's just implementing [pattern]". In some languages there are 3-4 main styles of programming, and figuring out which of them the author adhered to, and then reading something up on that idiom, can really speed things up.

  35. use grep by AeiwiMaster · · Score: 1

    There is a tool called grep which is very useful.

    1. Re:use grep by Anonymous Coward · · Score: 0

      Even better is ack

  36. Software archeology by geezerwhizard · · Score: 1

    Consider yourself a new explorer in the developing field of Software Archeology. And if you're a programmer, consider that the task is listed under the heading of "jobs for programmers". Try to make it so that the next programmer to deal with the code has a few more advantages than you.

  37. Re:30 to 40 thousand lines isn't large by any meas by Garridan · · Score: 4, Informative

    Oh yeah, well I just inherited a code base of 2.8 trillion lines of assembly code, and I have to read it over a 12.734 baud VAX connection! Why, back in my day...

    Anyway... I've taken on a few large-scale software projects before, and my approach has always been "read twice, hack once". I agree with the the parent, and I'll add a note: for the love of everything sacred and unholy, use revision control, and don't trust it -- that is, back up incessantly. Document the hell out of your process. Once you've really learned the system, you might want to back out some of the newbie mistakes that you're making right now.

    And yes. Learning a big system takes a lot of time -- you should be reading much more than writing until you've learned it. I find it helpful to diagram dependencies / draw up finite state machines.

  38. Re:You are an idiot by binarylarry · · Score: 2, Funny

    yeah, the clown always creeped me out as well.

    --
    Mod me down, my New Earth Global Warmingist friends!
  39. Re:30 to 40 thousand lines isn't large by any meas by abigor · · Score: 1

    That is indeed a heinous scenario, but don't conflate "obfuscated" with "large".

  40. Re:30 to 40 thousand lines isn't large by any meas by McNihil · · Score: 0

    Couldn't agree more. Even 4-6 million lines is probably fairly common and still not a big issue. One is more inclined to enter the "cut the cruft mode" sooner rather than later when its at that point.

  41. Doxygen by Anonymous Coward · · Score: 0

    Run it and step through it. Also, use doxygen (http://www.stack.nl/~dimitri/doxygen/) to highlight keywords, create hyperlinks to follow functions, and describe the data structures.

  42. Hunt down the original developer(s) by Anonymous Coward · · Score: 0

    (And then shoot them.)

    Good lord, you're not going to eat'em afterward, are you?

  43. Re:You are an idiot by Anonymous Coward · · Score: 0, Funny

    I am article submitter O.P. and not retard I am programmer with Master DEgree in Computer Science from Indian Institude of Technology and If I am retard why does IBM give me 40.000,00 lines of code? American IBM cannott do it so they give it to me because of my education in India

    IBM paies me 2 Mexican paysos for every line of code I fix that American coder screw up and I need food and room like American does. If American wants money than American should do job correct the first time and not have to send it to INdia to get all the work done correct. As AMerican teenager say DONT HATE THE PLAYER HATE THE GAME

  44. Obviously... by Anonymous Coward · · Score: 0

    You're not a kernel hacker.

  45. Done that.. by spasm · · Score: 2, Funny

    As someone who recently passed off a pile of code of about that size in poorly written and poorly documented php to someone.. All I can say is I'm very very sorry, and I had *no idea* my personal side project would work better than the original commercial offering and be declared 'mission critical' three months before I left for greener pastures..

    1. Re:Done that.. by Alan426 · · Score: 1

      That seriously happened to me. Every time I look at that (unchanged after one year) website, I remember examples of undocumented hacks in there that made a nice proof of concept but were *never* intended for the production system. I feel a little bad at first, but then I remember why I left that job -- and chuckle.

    2. Re:Done that.. by vnaughtdeltat · · Score: 1

      Jon, is that you?

    3. Re:Done that.. by spasm · · Score: 1

      No, but I love that so many people can immediately think of a likely suspect..

  46. Quit by codepunk · · Score: 1

    I just took the easy way out and quit. I had inherited about 30K lines of php code
    that was written by my boss. Shortly after inheriting this spagetti mess I ran a grep
    across the source the word "function" did not occur a single time in the entire source
    tree. To top it all off I was not to rework any of it only maintain it as it was going
    away. I did end up installing it on about 5 new machines so going away anytime soon
    was not going to happen. On top of all that I would run into about 20 blocks of if
    statements per file and in addition most database calls etc had the report no errors
    @ in front of them. I found it much easier to just hand it back to the boss and quit.

    --


    Got Code?
    1. Re:Quit by Anonymous Coward · · Score: 0

      Well, I can see why the word punk is in your name. Seriously, if you took a couple days to read through and document as you went, you should have been fine. I think you should look into a different career if you can't handle 2000 lines of bad code. (Unless it was 2000 lines of Perl one-liners).

    2. Re:Quit by codepunk · · Score: 1

      20 to 30k lines of crap code not a single function in the entire source tree. That is
      not the worst of it there where 4 other programs just as bad that interacted with
      a single database, each of these setting various states. It worked but was about
      the most fragile thing I have ever worked with. In any case it is back in the
      proper hands, the guy that wrote it.

      --


      Got Code?
  47. Divide and Conquer by Whomp-Ass · · Score: 4, Informative

    Identify each major portion of functionality. If you are working with a sales/billing system you would probably end up with : Orders, Invoices, Payments, Admin.

    Go through each of those portions and identify the major portions. Orders: Order headers, Order details, business logic, ui logic, reports, datalayer, etc. Repeat until reduced into easily consumable units.

    Pick and stick to an SDLC. Use whatever fits the situation and the resources. For a small project (under 100k lines of code) you should be good by yourself. Anything more and you'll have to involve at least 1 other person for testing. For medium (100k-500k lines) you'll need an additional dev...For large projects (500K-5M lines) you'll need a project manager, lead dev, 2 devs, 1 test, and a UAT team...For larger projects you'll have something unique and frightening to the specifics of the software project and corporation/agency making it...anyway, I digress...

    Go through each subdivision line-by-line and re-write it yourself (even if you aren't going to put your re-written version into production); the only way you're going to truly understand what is going on is if you do it yourself. Use whatever language you are most comfortable with or is most appropriate to the task (or languages), it does not need to be the same as the original.

    Verify that for a given input, your version produces an exact output.

    Take a deep breath. It's not a race. It's a one-to-one functional mapping of your software (your mindspace) and the original software (the other developer(s) mindspace(s)). The code probably will not be straight forward. It has also been battle-scarred and will be warty. Changes of initial requirements through time and feature enhancements (feature creep) will have taken it's toll on what may have originally been something simple or even elegant. It's something of a niche mindset and if it is not for you, there exist many other exciting things to be programming.

    Ultimately, if you do as outlined above, you'll solve many problems, be able to make whatever changes you like, and in so doing have a way to present your design as a replacement if you want...Or not, if you don't; for 30-40k lines parallel development makes sense, in a way, for one person.

  48. Re:30 to 40 thousand lines isn't large by any meas by vsound1 · · Score: 1

    I inherited 30k lines of code when I started work "wet behind the ears". It was actionscript code (so no typing), spaghetti at its best. Probably not the best code to look at as a beginner. I also had inherited another 20k of clean java code, probably that was the only thing I felt very happy about. I agree to AC. 30 to 40k is no big deal. As a fresh programmer, i had inherited 50kloc.

  49. Re:30 to 40 thousand lines isn't large by any meas by QRDeNameland · · Score: 2, Interesting

    Just out of curiosity, what is your opinion of a "Large" codebase then?

    My first programming job was on an enterprise system that was over 7 million lines of just C++ code by the time I left, not including SQL stored procedures, web server code for the reporting system, and surely other code stuff that I can't recall. The entire development team for the system was something like 45 programmers. So to many of us, 30-40 klocs does not seem like a large codebase at all.

    That said, I've also inherited code in the 10-50 kloc area of magnitude that was far more of a challenge/nightmare to decipher and maintain than that 7 million line system was. Code maintainability has more to do with good system architecture and coding standards than it has to do with the size of the code base; without those you system will likely collapse under its own bloat long before it can grow to millions of lines.

    --
    Momentarily, the need for the construction of new light will no longer exist.
  50. Re:30 to 40 thousand lines isn't large by any meas by GryMor · · Score: 2, Insightful

    I currently maintain several million lines of perl. It's not hard, it mostly just works, and when it doesn't, it's not that hard to figure out where it's broken IFF there is a consistent repro case for the problem.

    If you have a proper development/production divide, there shouldn't be any weird production issues unless you or your predecessor missed some test cases. If you don't have test cases, that's a problem, if you don't have a properly firewalled and complete development environment, that's a problem, the code itself? Shouldn't be a problem.

    --
    Realities just a bunch of bits.
  51. Re:30 to 40 thousand lines isn't large by any meas by Anonymous Coward · · Score: 0

    People are more likely to be awed by your programming skills if you can help with this person's problem, instead of trying to impress people with the size of the programs you've worked on.

  52. Re:30 to 40 thousand lines isn't large by any meas by home-electro.com · · Score: 0, Redundant

    30-40K is nothing. One person should be able to handle that easily. Although I can imagine for an inexperienced programmer it can be too much. I remember the first 'large' program I wrote in school -- it was 400 lines.

    10 years ago I had to port 1.5 million lines from one UNIX to another. Well that's a large project.

  53. That's small by ameline · · Score: 2, Interesting

    Medium size is 250 to 750 thousand lines of code (one person can still understand how it all works). Big is 1 to 10 million lines of code. Really big is >10 million.

    I have worked on code bases of all of those sizes, and I like the medium size the best -- it's big enough to be interesting, and small enough that you can understand it all.

    One that I've worked on (over 25 million lines) is just too big for my tastes -- over 3 hours to do a clean recompile is excessive.

    --
    Ian Ameline
    1. Re:That's small by Anonymous Coward · · Score: 1, Insightful

      Mine's 12 inches.

    2. Re:That's small by Anonymous Coward · · Score: 0

      30-40K is a large codebase? Kids these days.

    3. Re:That's small by Kagetsuki · · Score: 2, Informative

      25 Million lines compiled in 3 hours is actually pretty fast (unless you are talking about say assembing 25M lines of ASM).

      An associate of mine was working at a very high-tech electric (as in production and distribution of electricity) company. Apparently they had this very complex control system for a huge proprietary piece of hardware that was basically the core of the control rooms. It had to take in data from all sorts of different devices spread out across 100's of kilometers over a variety of proprietary protocols, make sense of all that data, try and figure out what the most likely scenarios for failures were and automatically implement control scenarios to mitigate damage or keep parts of the system running etc. So the story is the thing was written in a combination of C and assembler, and the file count alone was in the hundreds of thousands. They had two extremely beefy boxes set up to just do compiles, incremental compiles and re linking taking a few hours and clean compiles taking basically an entire work day (which is why they had two boxes, so they could start one compile after the other so different people could test their changes more often). The thing is to test their changes they actually had a small control room and a collection of devices on a grid they used to test, and to push the new binaries and data files and get a test set up would take hours as well. Needless to say most of the developers would basically just live in the office during the last month or so of development, but the facility was running 24 hours a day either way so they had a full service cafeteria, lounges, etc. all in the building. Anyway, THAT is the biggest code base I have ever heard of; and I'd bet there are quite a few similar situations around the world.

    4. Re:That's small by Anonymous Coward · · Score: 0

      I can't help but agree with this annotation of medium and large. 30-40 KLOC is feasible to go through and exhaustively document the program's structure and core algorithms. While an in depth understanding can take a few months to grasp, a thumbnail sketch can be feasibly generated in a week.

      Two tips: (1) spend time running it in a debugger stepping through the code, and (2) break it. It can be hard to understand some things by watching them work correctly. Intentionally breaking things helps expose assumptions that were made by previous programmers. Figuring out how to fix what you broke forces you to understand what's really going on in the code base.

    5. Re:That's small by Anonymous Coward · · Score: 0

      I agree with you on the size definitions. I always felt the same way. But since I quit being a programmer I've been working on my own project. Because I have no one to answer to but myself I can polish the code to whatever level I want. I have found that with careful design I can pack a surprisingly large amount of functionality in less than 10K lines of code. This is small enough to be called a toy project, but it seems that the careful crafting of code is more interesting than the larger scale projects to me. When I optimize my code for readability I find that it almost invariably leads to designs with very small amounts of code. The more I work on it, the smaller (and easier to work with) it becomes. If you have spare time for a side project, I highly recommend trying it out for yourself.

    6. Re:That's small by Hurricane78 · · Score: 1

      Pff. GHC takes over 8 hours and eats up to 8 GB of RAM (per compiled file) in the process.

      I literally start compilation when I go to bed, and it’s nearly finished when I wake up.

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    7. Re:That's small by Anonymous Coward · · Score: 0

      Medium size is 250 to 750 thousand lines of code (one person can still understand how it all works). Big is 1 to 10 million lines of code. Really big is >10 million.

      I have worked on code bases of all of those sizes, and I like the medium size the best -- it's big enough to be interesting, and small enough that you can understand it all.

      One that I've worked on (over 25 million lines) is just too big for my tastes -- over 3 hours to do a clean recompile is excessive.

      25 million lines of code that you inherited and worked with by yourself? Okaaaaay...

  54. Re:You are an idiot by Deflatamouse! · · Score: 1

    It floats... they all float down here...

  55. Don't be discouraged, just keep at it by rxan · · Score: 1

    Don't be discouraged. It's not like English where everyone writes in a familiar way. Everyone writes code a little differently and it is hard to go through it. Even with good commenting it can be difficult. Just persist and hope that you can contact one of the original authors.

  56. Legacy code == code without unit tests by PatMcGee · · Score: 1

    Get a copy of Michael Feathers' book "Working Effectively with Legacy Code".

    I taught a grad / undergrad course using this book. We took a real open-source program as the class project, and the teams made significant changes to it. I thought it worked well.

    Pat

  57. Another "How do I do my job?" ask slashdot. *sigh* by Anonymous Coward · · Score: 0

    And the answer is obvious. UTSL. And since it's now mine anyway, I tend to walk around and see how things work, find places where things don't work so well, and refactor them. It's quite a lot of work, often meaning touching the same code several times to come up with something more modular, more compact, more efficient. Lots of work is ``enabling'' work. Clean up something, see what that exposes or enables some larger change to be put through. After a while change requests become simpler and faster.

    If you want to see how this really works, take projects with lots of fresh graduate or even freshman code in them to poke through. It's not hard, it's just lots of work. But then, what are you being paid for, anyway?

  58. Am I weird...? by chewthreetimes · · Score: 1

    ...because I actually enjoy going through someone else's code? I roll up my sleeves and, using print statements and/or a debugger, I diagram object relationships, flow, data structures...anything I can think of. It's like figuring out a puzzle. Of course, I've had the luck of never inheriting a total pile of crap. But give me anything from not-perfect-but-serviceable on up, and I not only can deal, but I'll have a good time doing so.

  59. I found this tool excellent for code comprehension by Wainamoinen · · Score: 1

    Check it out, it's called Code Browser . It's a lightweight and powerful editor that allows you to visualize, structurate, link, organize, comment and edit code.

    It's my favorite one for very large projects with houdreds of files and thousands of lines.

    From the project's description:

    "Code Browser is a folding text editor for Linux and Windows, designed to hierarchically structure any kind of text file and especially source code. It makes navigation through source code faster and easier."
    "Code Browser is especially designed to keep a good overview of the code of large projects, but is also useful for a simple css file. Ideal if you are fed up of having to scroll through thousands of lines of code. "

    Have fun!

  60. Become the developer. by Anonymous Coward · · Score: 0

    There are a couple of questions that you should ask yourself:

    First I would find out out how the program was designed, that is: Is it a bottom-up or top-down? Some languages offer better facilities for writing programs in one style or the other and some problems are solved better in one style or the other. Try to think like somebody who was given the task of "implementing X."

    If they chose bottom-up, the developers might have been competent enough to refactor code as they were writing it. How would somebody start implementing X from the bottom-up? Start deep down in the hierarchy of abstractions with the fundamental abstract data types that drive most of the program. If file timestamps are accurate, they should be able to tell you what the oldest module of the program is. Start there, then move on to the next layer that interfaces with that code. Wash, rinse, repeat.

    If it's top-down, find the design documents. If they're unavailable, reverse engineer them from the current code base. Is it a clean design? Ask yourself if they were competent enough to come up with it right away. How many people were working at the time and what were their levels of proficiencies? Hope that the most proficient programmers were assigned the most difficult modules. At some point integration must have happened. Find the spots where it did. Those are module boundaries. Read each module's code progressing "along the boundaries."

    The more accurate X was defined in the first place the later you're going to run into the uglies, the WTFs. That's going to be inevitable. Every code base has WTFs and OMGs.

    Ultimately, you must read all code to understand all code. That shouldn't come as a surprise.

    Chris Eineke

  61. Re:30 to 40 thousand lines isn't large by any meas by codeAlDente · · Score: 1

    I've never gotten tier 1 service for anything. But, for all intents and purposes, really, who cares?

    --
    He once inserted random mutations into his code, just so he could have the experience of debugging.
  62. 40,000?!? ARE YOU KIDDING ME? by raftpeople · · Score: 1

    When I was programming we did every project in 5 lines of code, or less, period. Anything more than that was just fancy stuff!

  63. Me by Anonymous Coward · · Score: 0

    >30-40 thousand lines ?

    You must be kidding. This is a tool. Business Apps are one scale more: 300-400 thousand lines.
    As for your question: Hire the original developer.
    No documentation->Code is not worth a cent. In the business world the documentation should be in the code. approx 1:1 code and comments line.
    This is the apps that control your money, phone calls, insurance and your airplane tickets. That's the real apps.

  64. Re:30 to 40 thousand lines isn't large by any meas by lgw · · Score: 1

    My first progrqamming job was also about 7 million lines of code - all assemby code. There were 5 of us maintaining it, and some of the object we were maintaining we didn't have matching source for (which isn't hopeless in assembly programming, fortunately, just time consuming and annoying).

    You can just read through 30 klocs in a few months, not a big deal, really. But for a larger codebase you have to learn how to do bugfixes without understanding the entire system. You can often find the source of an error by searching for an error message in the code, then working backwards (assuming you have and error message!). You won't be able to prove that your bugfix won't break something else.

    FOr adding new features, it really sucks if there isn't some architecture-level documentation to give you the big-picture understanding. Details around a given bug are one thing, but just finding the right APIs to use when adding some new feature can really such without good comments or an architecture doc.

    Stepping through the code with a debugger while you do some normal tasks will really help you understand the organization of the mainline code. Lacking good docs, it's the best way to get started.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  65. Unit Test Suites by shanmoon · · Score: 1

    I inherited a product with a code base of a few hundred thousands lines of code when I was a fairly new software engineer. To make it worse, it was cross platform (AIX/Windows/Linkux/HUX) with something like 20 nested make files. The code was essentially a business service application. My solution was to talk to the consumers of the product and learn what each service call was supposed to do. I then wrote a set of test suites for the application. I had to continually update the suite as a I went along, but it definitely exposed unexpected couplings or other strange behaviors in the code. I also ended up converted the project over to an ant based build script (ant was brand new at the time). It defintely taught me what the code was doing and how it was doing.

  66. Re:30 to 40 thousand lines isn't large by any meas by RogerWilco · · Score: 1

    It's not just architecture and coding standards. What I find, is that up-to-date documentation is very important. Not so much details about lines of code, but the general design, control flow and design decisions.

    --
    RogerWilco the Adventurous Janitor
  67. I'm afraid the time may already have passed by Chris+Newton · · Score: 2, Interesting

    If both the original developers and the knowledge they had have been lost, then it is probably already too late to perform any major maintenance on this code base. The project has already entered its “servicing” stage.

    At that point, you basically have two possible approaches that actually work: you can restrict maintenance to small-scale changes, which may be sufficient if the goal is just to keep the project ticking over for a while, or you can accept The Big Rewrite (which isn’t so big in this case) in order to get a project that can be properly maintained.

    If you want to go down the tactical changes path, there are a couple of approaches to finding your way around the code.

    If you’re familiar with the general field of the software, just not this particular code, then you can work top-down. Start with the key, high-level concepts you know the program implements, and try to find the code that represents those:

    • Look at things like file names and directory structure (often a good starting point, because these tend to reflect the original design/intent behind the code).
    • Get a tool like Doxygen to draw some graphs of the relationships between functions/classes in the code, and chances are the big clusters of related code will match some of the concepts you’re trying to find.
    • Just search the code base for key words from the problem domain. Look for functions/modules/classes named after them, or that refer to them often.

    Hopefully, if the code has a reasonable modular design and you just don’t know what it is yet, this sort of approach will identify the organisation of the code at a very coarse level, but then you can try to break down each area in more detail the same way.

    Alternatively, you can work bottom-up. Find a significant starting point, such as:

    • somewhere that generates some output you’re interested in
    • somewhere that throws an exception or trips an assertion relevant to a bug you’re trying to fix
    • a busy spot when you run the program through a profiler.

    Examine the code near that point. Look at what kinds of data it works with. Look at what functions it calls, and what functions call it. Try to figure out the wider significance of the code you started with, and the other code to which it relates. Then move up a level: what is the purpose of all of that code collectively? Repeat until you’ve explored as far as you need to.

    After some other discussions about these topics, I recently wrote up a couple of articles with some more background information than I’ve given here — link in my sig if anyone’s interested (though be warned that they are pretty long).

    1. Re:I'm afraid the time may already have passed by afabbro · · Score: 1

      If both the original developers and the knowledge they had have been lost, then it is probably already too late to perform any major maintenance on this code base. The project has already entered its “servicing” stage.

      At that point, you basically have two possible approaches that actually work: you can restrict maintenance to small-scale changes, which may be sufficient if the goal is just to keep the project ticking over for a while, or you can accept The Big Rewrite (which isn’t so big in this case) in order to get a project that can be properly maintained.

      Sorry, but I think that's absurd. Asking a team of new programmers to step in, learn, and maintain an intermediate-sized legacy codebase is hardly an unreasonable request. I have done just this.

      I'm not saying it's easy or there isn't a cost in time to it, but if I hired you as a programmer and you looked at our source code and said, "sorry, it isn't documented sufficiently, your only choices are to restrict to small fixes or to completely rewrite it," you'd be shown the door. You could certainly say "btw, this will take a few weeks to read through, and we better test the hell out of any changes until we get it down", and perhaps we'd make a choice about cost effectiveness - that is reasonable. But this dogmatic "sorry, not enough documentation = Big Rewrite" is nonsense. Oftentimes - most times? - it's more effective to master and modify than to rewrite.

      Big Rewrites are fun. Reading code is not. Programmers prefer fun to cost effective, which is hardly a surprise.

      --
      Advice: on VPS providers
    2. Re:I'm afraid the time may already have passed by Chris+Newton · · Score: 1

      But this dogmatic "sorry, not enough documentation = Big Rewrite" is nonsense.

      Well, please notice that this isn’t what I wrote. I talked about a situation where you had lost both the original developers and the knowledge they had. Documentation is one way to pass on that knowledge, but sometimes one of the least effective. If you can still recover it in other ways — for example, if the code is well-written and self-documenting — then you might still be in the maintenance stage of the project rather than servicing.

      Also, remember we’re only talking about a relatively small code base here. There probably isn’t a huge gap between making tactical changes and effectively rewriting the part of the system concerned anyway.

      Oftentimes - most times? - it's more effective to master and modify than to rewrite.

      I agree that a rewrite is very expensive, but there is an implicit assumption in your statement that it is possible to achieve a sufficient level of mastery to make the required modification effectively instead. Maybe you feel that for a 30–40 KLOC program, such a level is always attainable, and perhaps that is correct. But in general, once you’ve lost too much, it becomes very difficult to continue performing a full range of maintenance on the project.

    3. Re:I'm afraid the time may already have passed by Bat+Country · · Score: 1

      So spend about 1-3 months learning what the code does, taking notes and documenting as you go, writing down both your discoveries and what questions you're left with. Revisit that documentation regularly during the process rewriting any information you got wrong or learned more about including any gotchas you may have found. Start making a list of serious questions ("Why was this done this way," "What would happen if this component failed," "Why couldn't this have been done this way") and see if there are answers by the time you reach the end.

      It's really not as hard as everybody seems to make it out to be unless the original writers tried overly hard to be "clever." I've read and learned several undocumented and, worse, incorrectly documented (the documentation didn't reflect the current state of code at all) code bases of this size. It takes patience, it makes your head hurt and it's not always fun, but the payoff is excellent - you understand the code, you've become better at reading strange code (yes, it is a learned skill) and you probably understand the code nearly as well as the people who wrote them by the time you've finished.

      --
      The land shall stone them with the bread of his son.
    4. Re:I'm afraid the time may already have passed by jgrahn · · Score: 1

      Big Rewrites are fun

      ... until you're "done" and the code has to face the users. I think programmers learn this after a few years.

  68. Fix small bugs by Midnight+Thunder · · Score: 1

    I have been given projects of this nature and the best approach is to document what is obvious and then use bug fixing as a way in to the code. While it won't give you a complete picture, it should help you understand what is immediately important, and serve as guide posts for knowing more in the future. Generally I try not to spend too much time trying to understand everything, since its a waste of time, unless that knowledge is guaranteed to serve you - sometimes the client just wants it be tweaked once in a while, so it probably is not worth the time if you can't charge them for it.

    To sum up: give yourself a general picture and then concentrate on the details only when it matters.

    --
    Jumpstart the tartan drive.
  69. Re:30 to 40 thousand lines isn't large by any meas by hahafaha · · Score: 0

    > I find it helpful to [...] draw up finite state machines.

    Unless his entire code is written in regular expressions (which, albeit, *would* be a total bitch to maintain), I don't think finite state machines are going to be very helfpul.

  70. Design patterns are your friend by PerlPunk · · Score: 2, Interesting

    "A couple of times in my career, I've inherited a fairly large (30-40 thousand lines) collection of code. The original authors knew it because they wrote it; I didn't, and I don't."

    A couple of times in your career? You must be lucky. Most jobs you can get coding will always involve taking over someone else's code.

    In my experience, design patterns are your best friend, bearing in mind that most of the code base will always remain a black box to you.

    For example, when I was doing some health insurance work, I had inherited a code base that was substantially larger than 30 or 40 thousand lines of code. The objective was to make the code that used an older, fixed-length record format work with the newer X837 EDI format, which is basically XML but almost without any tags to help you figure out where the data begins and ends. Suffice it to say that the task was to figure out how to smoothly stick a square peg in a round hole.

    The task itself determined the design patterns, of which an adapter pattern was the most used. The type of pattern in turn dictated what in the code to look for in order to implement it, and (of course) how the new code would be built. For example, since we were using an adapter pattern, the first order of business was to find out how the data was represented in the code base, and then trick the "black box" into using your own spiffy, new representation of the data.

    For the most part I didn't have to care all that much how the application handled the data as long as I got the right data into a form the application would accept in my adapater.

  71. Re:40,000?!? ARE YOU KIDDING ME? by Surt · · Score: 1

    Sure, but the medical policy must have been ridiculous to cover all the RSIs from the scrolling.

    --
    "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
  72. Re:30 to 40 thousand lines isn't large by any meas by Anonymous Coward · · Score: 0

    Hint: monads are two-state machines. Learn them.

  73. Re:30 to 40 thousand lines isn't large by any meas by benjamindees · · Score: 5, Funny

    Perl is like the matrix. At a certain point, after you've stared at it long enough, it all just makes sense.

    --
    "I assumed blithely that there were no elves out there in the darkness"
  74. Use it or lose it by MpVpRb · · Score: 1

    Somehow, I suspect that the original developers don't remember most of it either.

    Unless you work with it every day, little by little, you forget.

    First you forget the tricky parts.

    About the only thing you remember after a few years is the general structure.

    If you work with it every day, soon you will know it better than the original developers.

  75. As a maintenance programmer by npsimons · · Score: 5, Informative

    As someone who has done probably 90% of his work in maintenance programming, let me give you my tips:

    • Snapshot what you get - don't change it, don't even look at it. As soon as you get it, check it in, binaries and all, to a change tracking system (eg, CVS, SVN, etc).
    • Now that you know what they gave you, and you can get back to it at any time, your options are seemingly limitless, but for the quickest way to get up to speed, I would recommend writing unit tests for the software. This will be long and tedious, but by writing unit tests you will a) learn what to expect out of the software, b) be able to tell when you break something and c) truly learn the software.
    • Automate, automate, automate! It's a close call as to whether you should start right away on your first unit test, or get the build system automated, but let me just say that it will save you a ton of time to have a "one button push" way to build, run and test the software. From there, you should be having your machine build and run the unit tests automatically, preferably nightly, from a clean checkout of the repository, just in case you forget to run a test after you change something or you forget to check something in.
    • Run the software (including unit tests) through the gauntlet - valgrind's memcheck, electric fence, fuzz, bfbtester, rats, gcc's -fstack-protector-all flag, libc's MALLOC_CHECK_=3, gcc's _FORTIFY_SOURCE=2 define, gcc's -fmudflap flag, gcc's -Wall -Wextra and -pedantic flags; any way you can think to flush out bugs, do it, and start fixing them; you will learn much, not just about the code, but about the thought process of the original coder(s) this way. Change tools as appropriate for your programming language and environment (including compiler/interpreter, libs, OS, etc). As you can tell, I do a lot of C and C++ programming.

    BTW, the fact that you have a hard time understanding this code may be more a reflection on the original authors' coding skills than on your abilities; any idiot can write code that "just works"; it takes a lot of thought, time and effort to write code that is maintainable, and more often than not, the original coders were short on at least one of those (if not all three). Here's hoping you have the time to follow my above tips; they take a lot of time, but can be worth it if you really need to maintain the code. It's funny to note that apart from the first one, most of those tips apply equally well to developing software from scratch. If the code already has a change tracking system, unit tests, a build/run/test system, *and* automated testing, consider yourself lucky and just start picking apart the unit tests.

    1. Re:As a maintenance programmer by Anonymous Coward · · Score: 0

      any idiot can write code that "just works" - that would be the absolute bare minimum I would expect from a programmer, I would also expect the code to be reasonably efficient, well structured and documented since it's almost 100% certain someone will need to fix/extend/modify/understand the code at some future point.

    2. Re:As a maintenance programmer by Anonymous Coward · · Score: 0

      The unit tests are vital, but as someone who recently inherited approximately the same size code base from an outsourced project I found refactoring to be THE most useful thing. Once you start rewriting small sections of code you will understand what the classes (or modules / procedures if it's not OO) do. It took me a month to completely test, document, and refactor my code base, but I took the project from about 35k to roughly 20k LOC.

      When you do start to refactor (notice I said 'when', not 'if') this book is invaluable: http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672

    3. Re:As a maintenance programmer by bill_mcgonigle · · Score: 2, Informative

      truly learn the software.

      And then if your unit tests work you'll know enough to comment the code correctly for the next time you or your successor comes back to it.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    4. Re:As a maintenance programmer by Anonymous Coward · · Score: 0

      On my current job I was handed an extremely poorly written bunch of programs (spanning various languages, neither of which the original coder mastered very well.) The code was written by someone who had apparently left the company in anger, and not all of it was even at an operational stage. Being much more of a 'creator' personality than a 'maintainer' personality, I ended up rewriting half of it from scratch and discarding the other half. Luckily that was what the company wanted me to do.

      I can agree with every point on your list though. Just making a clean copy of the source, running git init && git add . will do wonders to your productivity, because you know you're free to tinker and dissect and mess around as much as you want. If you want to truly pull something apart. I'd also add: Take good notes! Digging through old code is much like an archaeological expedition. And it's amazing how fast you'll forget something if you haven't written it down.

    5. Re:As a maintenance programmer by Anonymous Coward · · Score: 0

      Writen code that "just works" can be done by an idiot, but write code that "just works" as clients think that sould work is another question, doing that to be maintainable is another level.

    6. Re:As a maintenance programmer by DwySteve · · Score: 1

      Snapshot what you get - don't change it, don't even look at it. As soon as you get it, check it in, binaries and all, to a change tracking system (eg, CVS, SVN, etc).

      This I agree with but for one thing: Verify that the binaries you are given correspond to the code you are given. I've seen it too many times where they release a binary, then go back and do 'minor' bug fixes and don't recompile before handing it off to you.

      Save the binaries you've been given, then do a clean recompile and compare the two versions as well!

      --
      http://angryee.blogspot.com
    7. Re:As a maintenance programmer by pommiekiwifruit · · Score: 1

      Of course you will want to see if the bugs were producing output that people liked as well! It is particularly nasty if a low-level pointer/bounds/uninitialized variable bug happens to (usually) hide a high-level design bug, and fixing it causes the design bug to be revealed...

    8. Re:As a maintenance programmer by Anonymous Coward · · Score: 0

      This is great advice, but don't forget the basics. In order to work effectively on code, you should know:
          What the code is supposed to be doing.
          What change in behavior you are trying to achieve.
          Where in the code to make the change.
          How to change to code for the desired affect.

        I find that finding the "where" almost always accounts for >90% of the job. Especially when you find relevant
      variable names / code fragments in many places in the code. So I would put my emphasis on concentrating
      on the "where" road map.

      1. First Background. What the code base is supposed to be doing, in both general terms and more specific.
                    What Domain elements, issues and problems are being addressed.
                    What Logical (Software Architectural) elements, processes, and steps achieve the desired Solution.
                    What VISIBLE program elements are used to achieve them.

      2. As stated by many others: Where in the code is the main control loop,
            and where are the visible program elements called.

      3. What elements of the code base corresponds to each of the visible program elements. What other elements
                are only used by a particular element. (They are in the same logical box).

      4. What common utilities are used by many program elements and why.

      This is the detailed road map that the creators of the code would know that allow them to quickly find
      the right place in the code to make changes. You can not work quickly without them.
      Please, Record what you learn! You (or the next person down the road) will thank yourself later.

      Ideally, you would spend enough upfront time getting oriented
      to allow you to work effectively, but it never happens so...

      The best sort of overall advice I can give you is:

            IF POSSIBLE, DON"T DO IT ALONE.

        Individuals who are expert: coders, software architects, and domain specialists are rare, expensive, and
        probably don't have the time or inclination to have a life outside the office (or read Slash-Dot).

      Having two other specialists available: one Domain, and one Software Architecture, can drastically focus a Coders efforts.
      Just having a buddy to throw ideas off can also be invaluable.
      These resource would typically be used when first analyzing a task, and occasionally later, when new ideas are needed.

      Also, don't forget to mine change-historys, they will provide hints of what modules do, what (and how often) they have been
      changed and why. I find it as a good way to find candidates for special attention.

    9. Re:As a maintenance programmer by E+Jennings · · Score: 1

      Why has no-one mentioned reverse engineering? Depending on the language used there are tools out there which will analyse the code into UML models. This gives an idea of the architecture; which is always a good starting point. Tools like Rhapsody can round-trip: that is always an interesting experience.

  76. Re:30 to 40 thousand lines isn't large by any meas by keeboo · · Score: 1

    It depends on the code quality.
    40k lines of spaghetti, undocumented code may be a nightmare.

    1M lines of good and documented code may be even easy to deal with, depending on what you're going to do.

  77. Re:30 to 40 thousand lines isn't large by any meas by scamper_22 · · Score: 1

    First off, all us engineer or good programmers take a lot of pride in our work... This can sometimes be a problem.

    The real issue is that a company had 40K lines of code written and didn't staff it properly to maintain it.
    First, they should make sure the guy who wrote it didn't leave. Work conditions, payscale....
    Secondly, they should have had a transition plan. Either some 'slack' working on the same project.

    So that is your starting point. It is not your problem that you inherited this large codebase and have no idea how it works.
    Don't take it personally if you make a change and crap happens. Just make a change, hopefully there are testers... if you cause a bug, enjoy the CRs...
    That's how companies want to run their software department. That's how you behave.

    You will only learn the code by working with it. You will get CRs, grep files for what you're looking for, make a change and deal with the after effects
    After a few months, you'll start to get the hang of it. After a year, you'll be good...

    That's just life in poorly run software companies :P

  78. Re:Piker.... by Anonymous Coward · · Score: 0

    I know this'll get modded troll, but boy are you a douche.

  79. Fix some bugs in the defect tracking database by mstockmyer · · Score: 2, Informative

    When I joined a group that had a 2 Million SLOC program, I learned the most by fixing defects. It gave me a good reason to go traipsing through the codebase. It's painful, but it gives you purpose while reading the code. Just plain reading it gets boring.

  80. Dear Sir by nicknamenotavailable · · Score: 1

    Dear Sir,

    We have recently been placed in charge of inheritance of 40,000 loc, I have the
    privilege to request your assistance to maintain the henceforth mentioned sum.
    The above sum resulted from a contract, executed, commissioned and written five
    years (5) ago by a foreign contractor. This action was however intentional and
    since then the source has been in a suspended terminal awaiting the fg command.

    We are now ready to transfer the source overseas and that is where you come in.
    It is important to inform you that as outsourced servants, we are forbidden to
    debug foreign code; that is why we require your assistance. You will be required
    to debug and analyze the code and transfer the bug free code to our central
    repository after which we will reimburse you for your time with post it notes
    and slightly dated coffee creamer.

    We are looking forward to doing this business with you and solicit absolute
    confidentiality from you in this transaction. Please acknowledge receipt of
    this letter, using the above Telefax number for more details regarding this
    transaction. Also endeavor to send the requested information.

  81. You call *that* large? by K77 · · Score: 2, Insightful

    I call that a module. Large is anything over 1,000,000 LOC. Step up.

  82. Step 1: Find a very large wall by Deffexor · · Score: 1

    Step 2: Print out all the code (in very small font) and paste the code up on the wall
    Step 3: Identify all the classes, functions, DBs, etc.
    Step 4: Create a visual map (on a white board) of how they're all linked together.
    Step 5: PROFIT!

    That wasn't so hard, now, was it? :)

    1. Re:Step 1: Find a very large wall by nicknamenotavailable · · Score: 1

      Step 1: Find a very large wall
      Step 2: Print out allthe code (in very small font) and paste the code up on the wall
      Step 3: Identify all the classes, functions, DBs, etc.
      Step 4: Create a visual map (on a white board) of how they're all linked together.
      Step 5: PROFIT!

      6: Write a program that does all these steps automatically.
      7: PROFIT! (or get sued by the people who patented the process).

      But seriously, isn't there a program that can do all this?

    2. Re:Step 1: Find a very large wall by dominious · · Score: 1

      The point is that you learn the program while figuring out the links! If you have another program do the UML for you and you end up with a huge web of links ...um you don't learn anything do you?

    3. Re:Step 1: Find a very large wall by nicknamenotavailable · · Score: 1

      Yes, you're right. If you figure out the links by hand, it will take alot longer, but you will learn alot more.
      I was concentrating on the 'UML' part and the 'PROFIT!' part.

  83. It's just not the same. by tjstork · · Score: 2, Informative

    he ports of GNU utilities to Windows [sourceforge.net] or Cygwin [cygwin.com] or even your own company's Interix [wikipedia.org] and Services for UNIX [wikipedia.org] products?

    I had Win7 and Vista Ent with Services for Unix I downloaded, and it just did not feel right or work right. The command line utilities work, in part, because the whole OS in Unix is basically a tree of text files. windows isn't, and so, the utilities tend to be less effective. Plus, some gotchas like how Windows handles open files with applications, its all different.

    I thought interix would be the ultimate, but it instead it taught me the opposite. If you want unix, use unix. It's that simple.

    --
    This is my sig.
  84. Static analysis tools by Anonymous Coward · · Score: 0

    Using a static analysis tool like findbugs - and fixing all the problems it fings is a great way to get to know all sorts of corners of a big codebase.

    (and incidentally increase quality).

  85. Well... by RobertM1968 · · Score: 1

    [humor]

    You could always ask Microsoft... that sounds like almost every piece of software they currently (or even previously) sell or sold (with of course differences in the amounts of code). Maintaining it properly seems to be working fi.....

    ...ummmm, never mind. Perhaps you should simply learn how to market it really well, kill the competition via anti-competitive actions, and kludge on a thing or two so that you can claim you've "improved" it.

    [/humor]

  86. 30-40ksloc can be big! by CodeMasterBob · · Score: 1

    I've maintained code in the 30-40Kloc range that was "large" and really sucked to understand. Fix one bug, create two new ones. I maintain one such code base still, most modules have a McCabe Cyclomatic Complexity of over 100. Can't refactor/rewrite/redesign, management won't approve it. The original authors are long gone. I embed lots of debug in the code and turn on the debug output on sections that I'm working on.

  87. lxr by Anonymous Coward · · Score: 0

    I use a program called lxr or linux cross reference. I even extended it for myself to handle embedded sql code. And because it runs on a web server it allows a whole team to browse the code.

  88. Don't get mad, get even. by porky_pig_jr · · Score: 1

    This is what I've done once upon a time.

    1. Tell management the code is completely undocumented, not maintainable, unstructred piece of Dukakis.

    2. Offer to rewrite the code completely. Chance are they would agree. Of course it depends how large is the code.

    3. Rewrite the code. Make sure it's undocumented, not maintainable, unstructured piece of Dukakis.

    4. Resign.

  89. My Dick is Bigger than Your 250,000 lines of code by BlueBoxSW.com · · Score: 5, Interesting

    Really. A guy asks a question for help and all of these people keep telling him 30-40,000 lines of code isn't much.

    That's a lot of code to get your arms around if you didn't write it. It's not the end of the world, but it is a sizeable task, and is the type of topic that few professional journals or books will ever be written about.

    Having been in similar situations, I my advice would be:

    1) Try to get an understanding of the history of the code. Who wrote it? Why? How many developers? How long has it been around? Do people love it or hate it? Is there a version control system in place you can use for information?

    2) Look at it from a technical viewpoint. Is is complete? Does it compile and run? How many languages are used? Are there interfaces with other systems you need to know about? What dependancies are there? How easy is it to setup a test server? What parts are well coded? What parts stink up the joint?

    3) Dig for functional documentation. What does it do? For whom does it do it? What business needs does it support? How mission critical is it?

    4) Meet with the business owners. Seriously. This helps you do two things: #1-- Define the real business need (which may be different than what was understood by the previous developers), and #2-- Set appropriate expectations about maintenance. You'll work hard to maintain and keep it working, but you are working from a disadvantaged position. It is important they know this and support you in your efforts, rather than complain loudly when something doesn't work.

    5) Plan to remove the dead weight. There's always a lot of dead weight in these near-abandonded projects. Get an idea how to simplify things and plan your work in phases.

    6) Setup real test and development servers. Yeah, you know that wasn't already done.

    7) Use version control. But you know this. It's 2010, and no developer worth his/her salt would code a paying project without version control. Right?

    8) All fixes will take much longer than if you wrote the code, so be careful with estimating time.

  90. KLOC is a stupid metric.... by mevets · · Score: 0, Redundant

    Anecdota from the 2.6.13 source tree: ./drivers/usb/media 28846 ./net/ipv6 28901 ./fs/jfs 29103 ./fs/reiserfs 29268 ./mm 29446 ./drivers/usb/gadget 29453 ./drivers/char/drm 31944 ./drivers/scsi/aic7xxx 32463 ./drivers/isdn/hardware/eicon 33054 ./drivers/atm 33462 ./arch/alpha/kernel 34150 ./drivers/net/sk98lin 34598 ./drivers/ieee1394 34683 ./arch/i386/kernel 35251 ./arch/sparc64/kernel 35293 ./arch/ia64/kernel 36738 ./drivers/usb/serial 38002 ./sound/pci 38576 ./kernel 39278 ./drivers/video/console 39445 ./drivers/pci/hotplug 39969

    None of these, by any measure, are large

  91. Re:30 to 40 thousand lines isn't large by any meas by hobo+sapiens · · Score: 2, Insightful

    well that depends on how many developers we are talking about. The original question seems to indicate that the author has inherited the codebase. The need for this question wouldn't exist if the person were on some large team.

    For one or two or five people, 40K lines is a sizable codebase, especially if it has been poorly maintained / designed.

    --
    blah blah blah
  92. Overtime Hours by Anonymous Coward · · Score: 0

    I get this exact situation occasionally. The only possible answer is time. The longer you work with it the more clear it becomes. You may even end up liking their style.

    If you can work a lot of overtime at the beginning that will help. That is exactly what I do but then again I have the free time and don't have kids to watch after and such.

    Bite the fekken bullet if you can and put in a few 60 hour weeks.

  93. Re:30 to 40 thousand lines isn't large by any meas by tgatliff · · Score: 1

    I could not disagree anymore with your statement. As a consultant, I have designed and personally coded more than a dozen projects that were much larger than what the poster had. Also, it is simply impractical many times for the developers too stay simply because it is just not cost effective to do so. People generally will pay to have the new system in place, but rarely want to pay allot to maintain it. My experience is that it is generally best for me to move on, and let someone else maintain the system. To be honest, some systems have turned out poorly (typically due to late exploration of the requirements), but generally the codebases are quite simple for even a novice to maintain.

    However, I have found thru experience that the key to a good codebase is how it is segmented. Abstracting complexities is extremely important to a well maintained codebase. Meaning, in my opinion, the ideal design is one where you have hundreds of simple objects (although OO principles are not critical) that make up a very complex system.

    In short, I have sent a number of systems just like what the poster is talking about. They are generally poorly designed, hard to maintain, and typically very difficult to find/fix bugs on. If there is not a business case to re-design the system, however, then it is typically best to slowly start segmenting and abstracting the codebase until it starts to can reliably predict it will perform in the field.

  94. Software Archaeology by tachyonflow · · Score: 1

    I recently listened to an excellent Software Engineering Radio podcast on this very subject: Episode 148: Software Archaeology with Dave Thomas

    This guy has a lot of good pointers. (No pun intended. ;)

  95. No "find" and "grep"? by VirginMary · · Score: 1

    I feel sorry for you! I quit a really nice job about 12 years ago because I was fed up with Windows. I have been much happier since! Before then, I used CyWin which is freely available and offers the goodness of bash, find, grep and many other tools written by software developers for software developers. Recently I had the misfortune of having to write a short batch file on Windows 7 on a fairly powerful 4-processor machine with 8 GiB of RAM and noticed that a) the terminal (DOS?) window felt really unresponsive and b) that copying and pasting in it was bizarrely clunky. What's up with "mark"? Also, I am not someone who would claim that bash is an even remotely sane scripting environment, which is why I switched to Python for most my scripting needs, but Windows batch scripts are a friggin' nightmare! It seems that rather than improving on "sh", Microsoft decided to come up with something far worse. I now live in an OS X and Linux world and am much less frustrated. And you're right, of course, it does take time and effort to become familiar with a new code base. I felt pretty intimidated when I started out at my current job with a new build system and a new programming language. Now I feel like I can fix any bug in it and I have already added several new features! :)

    --
    When 1person suffers from a delusion,it is called insanity.When many people suffer from a delusion,it is called religion
    1. Re:No "find" and "grep"? by Volguus+Zildrohar · · Score: 1

      b) that copying and pasting in it was bizarrely clunky. What's up with "mark"?

      Quick edit mode - set it in the properties for your command prompt window, and save the setting for future windows when it prompts. Select text with the mouse immediately, right-click to copy a selection. If there is no selection, right-click pastes instead.

      --
      When confronted with one problem, some think "I'll use recursion". Now they are confronted with one problem.
    2. Re:No "find" and "grep"? by snowgirl · · Score: 1

      I think I'll save you the nightmare of what has to go in front of a perl script in windows to make it execute from the command line without having to invoke perl yourself.

      Basically, the windows equivalent of #!/usr/bin/perl

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    3. Re:No "find" and "grep"? by chentiangemalc · · Score: 2, Informative

      you're using windows 7 and batch files? use powershell, more powerful than any of the unix based shells (that i've seen) and there must be something wrong with your system...or non-OS processed using up CPU & memory....because i've used windows 7 on below minimum spec machines ( 1 GHz CPU and 512 MB ram and the command prompt was still very responsive.)

    4. Re:No "find" and "grep"? by Kalriath · · Score: 1

      What are you talking about? The ActiveState Perl installer configures that for you. You'd just enter the name of the Perl script and it would run. That's all. Even at the command prompt.

      --
      For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
    5. Re:No "find" and "grep"? by snowgirl · · Score: 1

      What are you talking about? The ActiveState Perl installer configures that for you. You'd just enter the name of the Perl script and it would run. That's all. Even at the command prompt.

      Which means exactly what, when the perl.exe binary is installed into a dev environment, and not individually to each machine...

      --
      WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    6. Re:No "find" and "grep"? by VirginMary · · Score: 1

      Sorry, I wasn't very specific, but I can't use Powershell because, even though I used Windows 7 for development, the script needs to run on various older flavours of Windows. It's hard for me to describe in what way the shell lacked responsiveness when typing. It's not that it couldn't keep up with my typing. It just didn't feel very smooth more "jerky" if that makes any sense? Also the command-line editing is atrocious. I am an Emacs user and love how bash supports quite a few Emacs commands. The windows terminal window supports what, arrow keys, BS, DEL, HOME and END?? Hmm, I have to admit that I don't really know.

      --
      When 1person suffers from a delusion,it is called insanity.When many people suffer from a delusion,it is called religion
    7. Re:No "find" and "grep"? by Lil'wombat · · Score: 1

      THIS is why I read slashdot! I learn something new that will save me tons of aggrivation. Thank You.

      --

      Truth: If it's not one thing, it's another

    8. Re:No "find" and "grep"? by 1s44c · · Score: 1

      you're using windows 7 and batch files? use powershell, more powerful than any of the unix based shells (that i've seen)

      The only way that statement could be true is if you have never seen any unix shell.

  96. Re:30 to 40 thousand lines isn't large by any meas by hobo+sapiens · · Score: 4, Insightful

    I am currently working with a mission-critical codebase, which is written in PHP and has absolutely no cohesive design to it. Well, unless you consider making everything static and unnecessarily inheriting other classes and overwriting static variables willy-nilly a cohesive design. There are business rules just everywhere and API requests everywhere and all kinds of calls that overwrite static variables. If you don't methodically trace logic it's really easy to get lost. What makes it worse is that there are many many variables that are named very similarly and you don't really know which one is right and which one is just going to get overwritten in some method call you are not looking at right now. And if this software fails, the worst case scenario is that my company makes no money. It really has made my life over the last few weeks pretty horrid. Fortunately I enjoy the job and the co-workers and am well respected there. Otherwise, it wouldn't be worth the aggravation.

    My advice: communicate your difficulties to everyone who will listen (refrain from complaining or bellyaching, just communicate). If you inherit something like this, and it is mission critical, then you need to take as long as it takes to get it right. That's right, AS LONG as it takes. Take the time to document everything. Bother the crap out of anyone who can help you. You are responsible for doing your job, and part of doing your job is figuring out how to maintain this beast. And in order to do that, you need to use every resource at your disposal. If anyone wants to rush you along, you need to communicate the difficulty and the importance of the task. If you have been working at a place for a while and have done a good job to date, then they should trust you. If you're brand new, then you'd better hope someone there values your opinion and doesn't merely think you are incompetent. If you are asked to make enhancements, don't refactor until you understand the code. So make enhancements, leaving the potentially crappy code in place, even copying it if necessary. Steadfastly resist the temptation to refactor until you understand the entire piece that you are trying ti refactor. Don't remove seemingly unnecessary variables, and don't reduce seemingly redundant database calls. That comes later when you actually know what you are doing in there. IOW, if you have to navigate a lion's den by touch, don't stop to groom the sleeping lion (unless of course, that is your given task.)

    The word inherit seems to imply that either the original maintainer no longer works there or has moved on to a different position. This means that it's you on the hook to figure it out. You've gotta dig in, buckle down, and get to it.

    --
    blah blah blah
  97. What NOT to do by bit9 · · Score: 1

    Ever seen that demotivational poster that says "It could be that the purpose of your life is only to serve as a warning to others." ?

    Well, that was me on my last project. I inherited a codebase of about 1.2 million lines of antequated C code, written by a dozen or so different people over the course of a dozen or so years, for half a dozen different projects. For your benefit, here are a few dos/don'ts that I learned the hard way:

    1. DO NOT try to be a hero and learn the code inside/out all by yourself. Going in, I had a long history of doing exactly that on numerous smaller projects. Turns out 1.2 million lines was WAY beyond my ability to grasp just by pouring over the source code. The whole time I was trying to decipher this massive, seemingly amorphous blob of code all by myself, there were at least 2 or 3 of the previous developers sitting a couple floors up. All I had to do was ask for help, but for a variety of reasons (they are very busy people, I don't want to come off as being incompetent, my own overconfidence, etc), I didn't use that resource nearly as much as I could (and should) have.

    2. DO NOT try to learn the code bottom-up, by diving straight in and trying to put it all together one piece at a time like a giant jigsaw puzzle. Get a good, solid big picture view in your head first. Draw it out. Data flow, logic flow, UML diagrams, whatever it takes for you to really understand it at a high level, before you start reading source code line by line, function by function, class by class.

    3. DO NOT be afraid to make a few assumptions, at least initially. Yes, this may well mean that your high-level mental picture of the code may have some errors that you will need to fix later on, but you need to use your time efficiently. If you can reasonably discern what a given module, file, or function does without having to read every line of code, go ahead and pencil it in on your high level diagram and move on. If you see a source code file named reset_xyz_board.c, you can be reasonably sure it's resetting the "xyz" board. No need to fully grasp every little detail right off the bat. There will be time for that later, if and when it becomes necessary. But keep in mind that with any sufficiently large codebase, there are going to be numerous dark corners that you never end up seeing anyway. Why waste time meticulously mapping out every single one of those dark corners when, in all likelihood, you are only ever going to modify a tenth of the code, or maybe a quarter at the most? The more time you waste obsessing about every minute detail, the less time you will have to truly understand the code from a high level.

    4. DO get help from your team! I don't mean the previous developers. If the codebase is large enough that you don't feel you can learn the code all on your own, chances are you aren't the only person assigned to the project. If you are the only person, and your bosses refuse to get you help, then good luck. Otherwise, enlist your fellow developers to help you figure the damn thing out, before you all go off trying to write new code. In my case, I was the team lead and started off with 3 other developers on my team. I was foolish enough to let my ego get in the way, thinking that it somehow wasn't "right" for a team lead to have to rely on his team to help him figure out the existing code (which I probably could have done if not for mistakes 1-3, but that's beside the point). I wanted to be the guru who had a better, clearer understanding of the code than the rest of my team. Why? Because I figured that was part of my role as a team leader, and I didn't think they would respect me as much if I didn't know more than they did. Let's face it, programmers are a meritocratic bunch. Ranks and titles don't equate to respect. Your fellow programmers will invariably treat you with a level of respect that is in direct relation to their estimation of your

  98. Only 30-40 thousand lines of code? by Anonymous Coward · · Score: 0

    That's nothing. Now if all of the files in the project are 30-40 thousand lines of copy-ghetti, well all I can say is good luck. May the refactoring be with you.

  99. Unit tests by steveha · · Score: 1

    The first thing you do is get everything under source code control. If it already is, good. You should have a clearly-marked branch that shows where you started hacking on it, so you can easily tell what pre-dates you.

    And by the way, I highly recommend the Git version control system. Among its many great features, it lets you use a version control system that is only on one computer, and get things right before you "push" your changes up to the group server. Thus you have the full power of a version control system, and the freedom to use it, without worrying about breaking things for anyone else. Best practice use of Git: on your local machine, make a new "branch", check out the branch, and do your experimenting in that. If you suddenly, urgently need to fix a bug in the main code, you switch away from your branch to the main branch, do what you must, then switch back to your new branch when convenient. If the branch doesn't work out, you can just delete it. If it works out, you can merge it. (By the way, the above is true of any "distributed" version control system, not just Git.)

    Several others have told you to start with unit tests. If the code base already has a set, start by studying them. If the code base does not have unit tests, write some.

    Presumably you inherited a working system. The unit tests will put a definition on what "working" currently means. When you change the code, if you introduce a bug, you want one or more of your unit tests to detect the bug and let you know, before you share your updated code with anyone else. Unit tests are some work to set up, but they provide huge peace of mind for you once you have a good set.

    And, whenever you are asked to fix a bug (whether you caused it or not), you add a unit test that tests for that bug. Over time the unit tests will become more and more valuable.

    I also second this advice by npsimons. Try various automated tools that check for memory leaks and such. If they find bugs, fix the bugs (in your private branch) and then make sure that the fixed version passes the unit tests. You will learn the code base as you find and fix the bugs, and you will improve the stability of the code.

    If you find any particularly important variables or data structures, you might want to add some assert statements that check those values in the Debug build. In the Release build, the asserts don't even get compiled in, so they are "free", but if you run the debug build, the asserts can find bugs for you. For example, if you have a crucial handle to some resource, and the handle is getting clobbered, put asserts all through the code that assert that the handle hasn't been clobbered yet, then run the debug build and see where the assert fires. This may not save you time if the clobbering bug only happens once, but you never take the asserts out, so the asserts can find a bug for you if you accidentally re-introduce the bug. (Note that this implies you will want to run your Debug build under the unit tests, in addition to your Release build. The asserts can fire and show you where a bug is, but you need the code to run, and if you have good code coverage from your unit tests, that will happen.)

    Good luck.

    steveha

    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
  100. Re:30 to 40 thousand lines isn't large by any meas by jimrthy · · Score: 2, Insightful

    That *totally* depends on the code base and the way the OP thinks. Sometimes they're a complete waste of time. Others...not so much.

    I've worked with plenty of programmers who see pretty much every software problem in terms of FSMs. One size does not fit all.

  101. Correct the problem by carlzum · · Score: 1

    Add comments as you review and work with the code. The exercise will help you learn and provide documentation when maintaining or rewriting it. After you're familiar with the code you won't have the perspective of someone new to it, start now.

  102. Re:My Dick is Bigger than Your 250,000 lines of co by kwerle · · Score: 1

    Really. A guy asks a question for help and all of these people keep telling him 30-40,000 lines of code isn't much.

    That's a lot of code to get your arms around if you didn't write it. It's not the end of the world, but it is a sizeable task, and is the type of topic that few professional journals or books will ever be written about.

    No kidding! 40KLoCs is a bunch of code - especially if it's poorly organized. I can only think of one project I've done that was that large, and if I were to do it again it'd probably shrink by 25-30%. I'd put a bunch of code into a library or 2 and reduce the number of moving parts.

    But that's also how I'd tackle this kind of thing: organize it, document the hell out of it, and unit test everything you can. Which to me translates into "make it yours."

  103. Spaghetti Code by Hasai · · Score: 1

    I was saddled with a ton of code at one point. It looked like it had been banged-out by the proverbial army of monkeys with typewriters, and they sure as hell didn't write Hamlet. It was pure spaghetti code, written by people who shouldn't have had access to an Etch-a-Sketch, let alone a computer.

    I couldn't read it. It was COBOL for Pete's sake, and I couldn't read it. It just didn't make sense. I had to go find several DOZEN of those old IBM flowchart pads and a template, and chart-out every single instruction. Even then it didn't make any sense.

    Finally, I took all the flowcharts and spread them out on the main computer room floor, a-la A Beautiful Mind, and go crawling around on them with a big fat red marker. My first break was when I realized roughly 60% of the code was "dead:" it would never, ever be branched to. After striking-out all the dead code, I then wrestled with the file I/O, until I realized that whoever had written it had no concept of a buffer: the code would read a record, get a field, read the same record, get another field, etc.

    In the end, I trashed roughly 78% of the code and then re-wrote what was left. One program went from 64 pages to sixteen, then on the re-write went down to four. Yup; FOUR. Run-time for that same program went from sixteen HOURS to 32 MINUTES. Then I re-wrote it again, this time in 4GL, and the four pages became a half-page. THEN I had to go to the Big Boss and tell him that whoever had written the original code had rigged the program to generate falsified fiscal information. Yup; the thing lied right through its teeth. You should have seen the reaction.

    Whole thing took about three months, untold amounts of coffee, and three bottles of Maalox. Have fun with your own code.

    --

    Regards;

    Hasai

    1. Re:Spaghetti Code by Animats · · Score: 1

      Finally, I took all the flowcharts and spread them out on the main computer room floor, a-la A Beautiful Mind, and go crawling around on them with a big fat red marker.

      Many years ago, I actually had to do that with a painful piece of FORTRAN. I took over a conference room with a very long table for the job. Got it all straightened out in about two days once I could see it all at once.

      Haven't coded a GOTO since the end of the FORTRAN era.

    2. Re:Spaghetti Code by azgard · · Score: 1

      It's interesting story, but you are lucky that the Big Boss wasn't the original author.

    3. Re:Spaghetti Code by ErikZ · · Score: 1

      You also distilled 3 months of work into 4 paragraphs on Slashdot.

      --
      Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
  104. Re:30 to 40 thousand lines isn't large by any meas by jimrthy · · Score: 1

    This *totally* deserves a mod up

  105. Re:30 to 40 thousand lines isn't large by any meas by dindi · · Score: 1

    This site I just googled: http://buytaert.net/cms-code-base-comparison has an interesting (not sure if accurate, but you can wc -l all the files in the latest if you want) comparison on CMS systems.

    Wordpress has around 60.000 lines - not too much according to you - and first I somewhat agreed.

    To write a module/plugin is relatively easy because docs are OK most of the time. But to MAINTAIN for example the entire WP codebase and knowing every little detail is a different thing IMO.

    We have to maintain a similar size ASP JSCRIPT site (around 40k lines last time I checked), and who knows how much more for the native WIN components..... our decision was to rewrite the whole thing in PHP, and the rest in probably JAVA or C with perl for some data processing.
    Well, you have to imagine how happy we are with the completely undocumented code that has no comments, and updates sometimes come in the form of unexplained set of files in a cute zip package. A diff would show 10000 changed lines, and since it does not follow the MVC model, you have a lot of html/design embedded in the code (in an ugly way really)....... no explanation on what was changed, not even a list of functions......

    Well, what I am just trying to say is that I can see how a small project can span over 3-5000 lines you know by hearth, but how someone else's crappy 40k code can be a nightmare at the same time.

    By the way, the language is also a factor..... 40k lines of perl can be a lot to read ( considered "write only" by many), while 2 mouse gestures can generate a few-hundred lines easily in any visual IDE.........

    just my 2c really.......

  106. One bug fix at a time by codgur · · Score: 1

    Once for over 1 year my sole job was to maintain the most lucrative product for the company (millions/year). There were numerous other products with newer technology but this was a legacy system comprised of a C++ socket based service and numerous front end scripts and middle tier C++ components (~15-20k lines of code in all those aforementioned technologies). Any wrong change could cost thousands of dollars / day if not more. There were bug fix projects and enhancement projects. I learned that you learn the code one-bug-fix-at-a-time. The first goal is to get it working. Second goal is to break it on purpose and generally play around with the system. Also become very intimate with a debugger. It will make or break you. I didn't have the luxury of having the 'original developers' around (they were fired) so there was no prior knowledge. You are looking to keep your job for a while aren't you? Those who can do maintenance work (everyone wants to work on the new and latest code and coolest projects) will be employable till time ends. It is not glory work. Having done it for over 1 year on the same project I can tell you that the maintenance coder is not in it for the glory but rather for the satisfaction of a job well done AND for a steady paycheck.

    1. Re:One bug fix at a time by __aaclcg7560 · · Score: 1

      When I did my internship as a QA tester, I found a rare crash bug on a test server that I could reproduce but my boss couldn't reproduce it and ignored it. I was after all the intern and nobody listened to the intern. The update patch was applied to the production server and the rare crash bug became a consistent crash bug. The server went down for three days until the programmers could implement a deep fix and cost the company $250,000 USD in lost revenues. Alas, the company let me go after my internship ended. One-third of the division was let go a week later to make up for those lose revenues. Go figure.

  107. Re:30 to 40 thousand lines isn't large by any meas by Enleth · · Score: 2, Funny

    Seeing software problems in terms of Flying Spaghetti Monsters? Ah, so that's where the "spaghetti code" term comes from!

    --
    This is Slashdot. Common sense is futile. You will be modded down.
  108. You *should* feel bad... by Bright+Apollo · · Score: 1

    ... because clearly, you like setting the bar impossibly high for yourself.

    You will never know the code as well as the original developer. so stop trying. For very old cases >10 years, that developer was also the analyst who gathered the requirements, further cementing you to a 3rd-bit player in the drama. Let it go.

    You *can* maintain someone else's code, though, if you can do a few things:
          -dispense with ego
          -learn to *read* code, especially as a reviewer
          -ask lots of questions

    As a maintenance programmer, you have to be fearless about asking questions, even if they dead-end you. You asked. You were thrust into a bad spot, you do your best to figure out where you're at. Assess the situation. There's no rush to fix anything, it's not like the problem's going anywhere and no one is hiring clueless mission-critical coders.

    Start small. Start really small, like just reading the code as you might in a code review and see if you can spot trends. If you've been doing this awhile, you can start picking up on the strengths and weaknesses of the author(s). At the very least you can start to immerse yourself in the style and convention, making translation to the actual algorithms easier, i.e. what's this bit doing? I'm not embarrassed to say I've professionally reviewed code that I could never write -- it was VB and ASP -- but I know what object-oriented code should look like, should be capable of doing, and this wasn't it. It wasn't even good procedural/ iterative code... but that's besides the point. The point is, I know when to use a while loop, a for loop, and when to unroll the loop. It's the kind of knowledge that comes in handy no matter what language I'm looking at. Declarative? No problem, it's set-based thinking and straight Boolean logic. Functional? Fine, let's start busting down the parentheticals. It's also about moving data into a register, eventually.

    So, you start small, you read the code, you trace some data by hand, a little, and then... run the fuckin' thing with a debugger, step by step, and watch the data move. If it takes you all day to run it once, you're entirely ready on day two to start messing with it. You've likely done what only the original developer has ever done, and that's seen data at the top run straight through to the bottom.

    --#

  109. Re:30 to 40 thousand lines isn't large by any meas by jimrthy · · Score: 1

    I currently maintain several million lines of perl. It's not hard,

    Bow to your superior wisdom. I look at ~3 lines of perl and my brain overloads.

    it mostly just works, and when it doesn't, it's not that hard to figure out where it's broken IFF there is a consistent repro case for the problem.

    Ah...you just lost a huge degree of the admiration I was feeling.

    All the interesting problems I've run across in my career did not have consistent repro cases. If they had, they'd have been easy to fix.

    If you have a proper development/production divide, there shouldn't be any weird production issues unless you or your predecessor missed some test cases.

    This sentence made me wish for troll mod points.

    Even if (and that's a big if) you (much less your predecessor) managed to convince your boss that spending time writing unit tests was worth the time/money, you missed some test cases.

    If you don't have test cases, that's a problem, if you don't have a properly firewalled and complete development environment, that's a problem, the code itself? Shouldn't be a problem.

    Automated unit tests would make the OP's life easier, to a degree. But they wouldn't make this code base any easier to learn. I feel like I'm feeding a troll here, but someone mod'd this up. So someone actually thinks you were saying something worthwhile, and I just don't see it.

  110. Call Stacks by SuperKendall · · Score: 1

    I find the most instructive way is to see a real call stack from an application.

    When I was doing Java a great tool was TogetherJ - you could point it at a method and say, show me all the possible calls this method can make. This can yield a really huge visual document (that I printed out on a plotter) but it was really useful into peering into the application.

    If you don't have a tool like that, the next best thing is picking some interesting things way down in the bowels of the application, and get a call stack at that point (either breakpoint while you are running or some kind of log). Do that in a few places, and you start to have a sense of how things flow.

    The thing I like to understand in any application, my own or others, is data flow - knowing how calls reach either other can help you understand better how data flows through the system.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  111. OpenGrok by lytfyre · · Score: 1

    During a co-op job I worked on a very large multi-platform app (several million lines of code)

    the team had an LXR setup to do project wide searching, however it was aging and having problems, and is a bit difficult to work with.
    As a side project intended for a report once I was back on campus, I set up OpenGrok, which worked brilliantly, and was reasonably easy to configure, and nicer to use once we got it setup. The team liked it enough that they switched to that permanently.
    both are open source, and were built to handle large code bases (LXR was built for the linux kernel, OpenGrok for when Sun open sourced Solaris).

    Another one I had tried, which was very easy to setup was Gonzui. It's also open source, but didn't really handle the huge codebase as well as OpenGrok or LXR. For under 100k lines, it's probably fine, and the ease of setup may be worth it.

    All three provide a web interface, and do indexing as a separate process from search, so we would re-index the code base nightly. works very well for larger teams, might be overkill for what you need though.

  112. Re:30 to 40 thousand lines isn't large by any meas by Gr8Apes · · Score: 1

    Heck, that's less than a month's output, if you're working on a well-designed project.

    --
    The cesspool just got a check and balance.
  113. Re:30 to 40 thousand lines isn't large by any meas by elnyka · · Score: 1

    Just out of curiosity, what is your opinion of a "Large" codebase then?

    That depends on the language, but anything starting above a quarter of a million starts to get large. Consider the Linux kernel - not a typical distro, or the dev tools, or even a minimal bare-to-the-bones distro, but the kernel. The 2.6.0 kernel is over 5 million lines. Later kernels are twice as large. 30-40K is about the lower threshold of a mid-size stand-alone system or a component in a much larger system. For example, at one job I worked on a component that was about 200K LOC. That was one piece in a distributed system containing several dozen vertical components on top of vertical layers of stuff summing up to several dozen million LOCs. This is only considering source code. Once you start considering configuration files, deployment and installation scripts, it gets more complex.

    There is now a classification for ultra large systems that in the near (and very likely) future could easily go into the billions, posing new challenges on project management, source control, and just about anything relating to the question "who the fuck knows what this gigantic shit is supposed to do."

    Now, difficulty of maintenance is not just a function of code size, but also code structure and organization and documentation.

    You can work with a monster system that is in the millions of LOCs and not have a substantial problem implementing new functionality or bug fixes, and then in another job you have to maintain poorly written JSPs that collectively are in the 50-100k (with the later job being a mutant klingon bitch.)

  114. find / grep / glimpse by amiga500 · · Score: 1
    On Windows I install cygwin, so I can execute grep. I use the following bash function to help me search code:

    ffind ()
    {
    find . -name ".svn" -prune -o -name "CVS" -prune -o -name ".hg" -prune -o -exec grep --color=auto "$@" {} +
    }

    For larger code bases, I use the command line version of glimpse to search through the code. While there are a few open source code search engines, I find glimpse with a few formatting scripts works just fine.

  115. Read, read, read by Anonymous Coward · · Score: 0

    To understand the code, you need to read the code. You'll also need to index the code so you can bounce around it to read, since the limit of most people's stack is only a few items.

    Next, figure out the dead wood. Don't remove it yet.

    Next, learn what the heck the thing is supposed to do. Find out from what the code interfaces to what it is supposed to do. Talk to users and/or the business owners. talk to the authors of the code. Speak to the problem domain experts.

    Next, make sure that you know when it works. Regression tests are your friend here. You need both global tests to make sure you didn't break anything in the large, as well as unit tests, to make sure you didn't break anything in the small.

    Next, start to remove the deadwood to make sure it conforms to the spec. This can be an excellent way to learn how the code works, but also is fraught with danger. Why is that extra field always '0'? Remove it. Could be nobody notices, or it is critical for the parser for the consumer of the data to continue working. Learn what matters and why. This step may not be feasible in some environments.

    Assume everything will take 3x what you think it will. There's often hidden dependencies, no matter how clueful the original author was. Odds are he/she/it wasn't clueful (playing the numbers), which means 3x is too optimistic.

    Resist the urge to recast it in your own image. It won't help as much as you think it will. Rewriting from scratch often is a waste of time, even if it thinks it is a good idea at the time. I've been burned by this several times, often with only so-so results.

    Plan on spending extra time documenting and speculating what the code should be like. Chances are this won't be the only time you have to do this.

    I've also found it useful in learning to read code to read, say, the 4.3 BSD network code then read the annotated books on the topic. It is big enough to be interesting, and small enough to keep in your head. The linux kernel books cover something that's really too big to learn from easily.

    Nobody teaches this anymore, but that's another rant.

  116. ctags and cscope by scotch · · Score: 1

    233 comments and not one mention of ctags or cscope yet.

    --
    XML causes global warming.
  117. Another possible pitfall by laughingcoyote · · Score: 1

    I've been in a similar situation myself, though thankfully not (as sounds possible for you) by myself, and I learned one thing above anything else.

    Never, ever, trust your memory. As soon as you figure something out, write it down. Right that second, while it's still fresh in your mind exactly what you learned. It doesn't matter as much how you write it down (commenting the code, a separate text document, or for that matter keeping a notebook and pencil close to hand), just that you do. If you don't, you will run across the sinking feeling that you already figured this problem out before, and since you don't remember what the answer was, you're about to do it again. It will also help others that you work with, and even if you don't right now, it's quite possible that you will.

    --
    To fight the war on terror, stop being afraid.
  118. Source Browsing tools by Anonymous Coward · · Score: 0

    cscope or GNU Global are great for learning how code works. They are much more efficient than using find and grep.

  119. Re:My Dick is Bigger than Your 250,000 lines of co by Mean+Variance · · Score: 1

    5) Plan to remove the dead weight. There's always a lot of dead weight in these near-abandoned projects. Get an idea how to simplify things and plan your work in phases.

    There's a lot of anti-IDE rhetoric going on, but I rely heavily on mine, Eclipse (for Java programming). I also rely on Vim, TextPad, less, and so on depending on the task. But for this particular question and the point about dead weight, leverage your IDE to clean house. You can play with compiler and static analysis flags to remove things like unused: private methods, imports, variables or whatever is applicable to your language. If the formatting is inconsistent, run a formatter that pleases your eye (assuming there isn't a group standard for that ... another religious programmer's topic).

    Other parts of Eclipse that I rely on especially when I'm in another team's code (we have about 2m real LOCs):
    * Call hierarchy [ctl-alt-H]
    * References [ctl-shift-G]
    * Class hierarchy [f4]

    Where I am, some of these conveniences are becoming more difficult to leverage as Spring and its XML configurations define object relationships.

  120. But did the originals "get" it? by MaxToTheMax · · Score: 1

    I think the chances are very, very good that the original developers had the same problem you have.

  121. Re:30 to 40 thousand lines isn't large by any meas by dhasenan · · Score: 1

    I work on a 300KLOC codebase. It's mid-size. I started working on it about six months into a complete rewrite, and it's been over two years, so I know the code. But it took three months until I knew the codebase well.

    40KLOC? You should be able to pick that up in a month, full-time. I've learned an open source project of that size in my spare time in a few weeks.

  122. Write Unit Tests for it by crispytwo · · Score: 1

    It's a great way to make sure the code works the way you expect, and when it doesn't you can learn how it actually works. Often you will find that this will expose huge flaws in the original code too.
    After that, it's a source of documentation, sort of.

    Enjoy!

    1. Re:Write Unit Tests for it by Dr.+Hok · · Score: 1

      It's a great way to make sure the code works the way you expect, and when it doesn't you can learn how it actually works. Often you will find that this will expose huge flaws in the original code too. After that, it's a source of documentation, sort of.

      I'll second that. I have been in the same situation a couple of times. Unit tests are IMHO the best way to understand the code. You express your assumptions of how the code works in the unit test, and you verify your assumptions at the same time! If your initial assumptions were wrong, you learn the truth while creating a succesful test.

      Once I am reasonably sure about a piece of code, I usually write javadoc comments (or the equivalent in doxygen or pydoc etc.) to chisel my findings in stone. You can also start with the comments, putting a TODO wherever you're uncertain, then try to throw out the TODOs one by one.

      The additional benefit of unit tests is that when you have enough unit tests in place you can start refactoring the code to your taste without worrying about breaking it.

      --
      Say out loud: I'm an Aspie and I'm somewhat proud, I guess. Uh. Can I write an email in all caps instead? Hm...
  123. Not many people do this by symbolset · · Score: 1

    More should. It's a small part of the problem, but it does help.

    --
    Help stamp out iliturcy.
  124. podcast on this issue by Anonymous Coward · · Score: 0

    The Software Engineering Radio podcast at http://www.se-radio.net/ had a great show with Dave Thomas from the Pragmatic Programmers on this.

  125. Re:30 to 40 thousand lines isn't large by any meas by hclewk · · Score: 1

    Well, for my 2 cents, I've been working on a project by myself for the past 6 months, starting from scratch, and it's up to about 85,000 lines of code, and I would classify that as medium-scale. It all depends on what your perspective is I suppose.

    But, like you said, a well organized 85k lines is a lot smaller than a poorly written/organized 40k lines.

  126. report from toronto by The+Abused+Developer · · Score: 1

    ... this is the norm here in the last 7/8 years or so. doin' it differently - unit test, good design & practices, honesty, long term planning etc. - are the best strategies to get you bumped out of the project.

  127. Oh boo bloody hoo by Anonymous Coward · · Score: 0

    Oh no, there's no documentation...oh wait yes there is...it's on this single sheet of A4....in swahili. Perfectly normal introduction to the new work environment in my experience. Grow a set, and hope the guy who wrote it wasn't actually a genius because it's a hell of lot easier fixing the fuckups of regular developers.

  128. What a trivial problem by Anonymous Coward · · Score: 0

    30k-40k is not a lagre pile of code.

    If your having problems either the code is poorly written and documented or you have risen to a job that is at the limit of your capacity.
    I sugest refactoring the code a bit. It will tidy things up making you able to get to grips with it and let you have a tour of the code while your at it.

    Under 500,000 lines of code should not provide any size issues for most good programmers. It is poorly maintained spagetti code that will fuck you up at even 5,000 lines.

    If you get crappy code and you have to do any serious amount of work on it, your best path is to refactor.

  129. Re:30 to 40 thousand lines isn't large by any meas by Z00L00K · · Score: 2, Insightful

    It somewhat depends on the language used - some languages are easier to penetrate than others. And some languages does more in 10 lines than other languages do in 100.

    But anyway - to learn the code you may have to find a starting point (there is usually at least one logical point to start) and then make a flowchart in PowerPoint or something for the general structure. It's no point trying to get into the finer details, just a general sense of flow. You will get things wrong in the beginning, but don't worry. And you may end up finding a lot of dead code too.

    When you have a satisfactory overview of the code it's time to really swim and drink the code. Many programmers have a tendency to accept that "it works" and stop there. By throwing the code into the compiler at maximum warning level and then try to fix all warnings you will be even more involved. And if you aren't satisfied you can take on the code with code analysis tools like Splint (for C) or FindBugs (for Java).

    And don't forget that the commands "find" and "grep" in *NIX are your friends. Other environments usually have other tools, and IDE:s have their own, so you don't have to install Cygwin or something to get a grip on things.

    And if you think that you don't understand the code well enough - try to port it to another operating system or other language.

    Of course - this takes a lot of time and consumption of your favorite hacking beverage.

    And yes - I'm involved as a single developer in a system with about 400k lines of code written in Java, and it was ported from an older system written in C, C++, Basic, Java, DCL...

    --
    If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  130. Re:Large? Try shindex on corpnet by Anonymous Coward · · Score: 0

    A former Windows div Microsoftie says: shindex, baby, shindex! If you don't know what that is, ask the guy or gal in the next office over. And then be prepared to spend the next week or so troubleshooting permissions problems until it works across all versions you care about. But after that, you're golden and there's no faster way to search the source. And yeah, I agree with the suggestion of installing SOME *ix toolset. I'm partial to unixutils because they seem lighter weight than some of the subsystem-based solutions like Cygwin or SUA aka Interix.

  131. To learn the code by Anonymous Coward · · Score: 0

    flow chart it. Crawling through the code is the best way to learn it.

  132. I've been there by Anonymous Coward · · Score: 0

    A few years ago i was the maintainer for a couple of unix services. They ran on embedded machines and were 25-30k lines a pop. the best thing I found to cope with it was getting the code in a nice IDE (the cdt for eclipse) and using a visualization package to understand how all the data structures were laid out, I think i used graphviz.

  133. Pay attention to quality. by seebs · · Score: 1

    Look for things like misspellings, undefined behavior, indentation screwups, and so on.

    The reason is, if there's a lot of these, that's a big clue to you that you have to be MUCH more careful with the code, because it is probably crap. Stupid comments? Probably crap. Explanations of things that are a bit surprising, with citations or justification? Maybe not so bad. Comments that are visibly out of sync with the code? Bad. Consistent naming convention? Good. Inconsistent naming convention? Bad. Tons of copy and paste? Bad.

    Knowing whether code is good or bad does you a ton of good in understanding it. If you know the code is crap, you have a better chance of guessing how some idiot will have gotten it wrong. If you know the code is good, you can often guess how someone would have tried to make it robust and/or maintainable.

    --
    My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
  134. Re:My Dick is Bigger than Your 250,000 lines of co by Lock+Limit+Down · · Score: 1

    My advice to you is to start drinking heavily.

    A lot of good advice above, but there's a political aspect to this which is very important.

    How you do will very much depend on the expectations that management has. They very often assume maintaining code is much easier than writing it in the first place which is of course the exact opposite of the reality. Make sure you talk with them about expectations up front, about how soon they expect you to be competent in the code.

    In all honesty, I worked somewhere that had inherited about 800,000 lines of code, with 5 very sharp guys, and no one understood the code even remotely close to the original authors after 2 years of supporting it.

    Companies need to understand that unless they pair program, when they lose the programmer, and he didn't have a partner, they lose the code. You might as well just rewrite it because it will take a new person as long to understand the old stuff, longer perhaps because he is not learning it in the orderly progression that was there when it was written.

    There's a maxim, "He whose work is the most incomprehensible gets the most respect". The suits and pin heads who run software companies fall for this 100% so the worst programmers, who by luck of the draw got to write the first spaghetti mess, are glorified while the maintenance programmers are seen as little more than janitors.

    I make it a rule to go on unemployment before accepting a job as a maintenance programmer. Avoid it at almost any cost! It's a thankless job that usually ends in frustration and tears unless you have a VERY understanding manager or circumstances have granted you a very successful product that needs few fixes or enhancements.

    Often companies hire programmers when they are behind or they lose people because they overworked them. Thus you are coming into a bad situation, already behind with everyone expecting miracles.

  135. Re:30 to 40 thousand lines isn't large by any meas by rxan · · Score: 1

    The 'size' of the code really boils down to what needs to be examined/changed. If you have a billion lines of code that are rock-solid and a million other that may need to be modified -- that's a big difference. Programming is all about localized knowledge.

  136. That's nothing useless answer by Anonymous Coward · · Score: 0

    That's nothing. I work with 50K+ loc projects and because of that I won't offer any real solution to your problem. I just want you and everybody else to know it an so I write it here.

  137. Go through it and comment it by presidenteloco · · Score: 1

    Not necessarily end to end, but leave yourself a trail of breadcrumbs
    as you trace through and learn the code stories.
    If you can write about it accurately, you understand it. If you
    can't, you have to dig deeper in that area til you comprehend it enough
    to summarize it and its quirks accurately.

    I had a prof once who shall remain nameless, though he claims to
    have "invented" modules. But he did have some good advice. He said,
    even if you just hacked together some code (or someone else did), you
    can retrofit software engineering standards onto it by going through it
    and writing the design document after the fact (assuming the crap didn't
    come with one.) This not only leaves a legacy of a maintainable project,
    but allows you to understand the essence of the software and the
    important decisions that were made in the construction of the software.

    --

    Where are we going and why are we in a handbasket?
    1. Re:Go through it and comment it by ebbe11 · · Score: 1

      I had a prof once who shall remain nameless, though he claims to have "invented" modules. But he did have some good advice. He said, even if you just hacked together some code (or someone else did), you can retrofit software engineering standards onto it by going through it and writing the design document after the fact.

      Easy. David Lorge Parnas.

      --

      My opinion? See above.
  138. Learning requires a quiet environment by kobol · · Score: 1

    When I inherit such a monster I just start studying it. Too bad the environment of today's open plan office doesn't allow concentration necessary to learn code. This will doom our planet in a hundred years. If only women in the office could be required to shut up for at least 20 minutes out of every hour.

  139. Don't panic. Focus. Cscope, Wiki, and printfs by Sarusa · · Score: 1

    This has happened to me several times, and again just recently. I'm not sure how many lines of code it was this time (I don't really care), but several thousand files (I do care about the structure). 'This is your new project, we have some stuff we need done ASAP'. The big constraints are:

          - They want you to start doing stuff right away. That's usually a given.
          - Therefore you do not have time to fully understand this code. You do not have time to do a full dissection. Just give up the idea that you can even do so in the short term; that will just paralyze you.
          - Very little useful documentation. Read it if there is any, but keep in mind that it is usually out of date and therefore a filthy lie.

      What you need is a good understanding of the parts of this code that are important right now and some high level overview. If you knock off enough of the little things you will end up learning the whole thing. In this way you gain enough confidence to move forward. So, get cracking:

        - Make a safe copy. If you're lucky it's already in version control. If not, do it yourself. Check in your test stuff fairly frequently (not in the main trunk!) because you will be breaking things often at first.
        - Use cscope or any other tool you like that will let you hop around the code like hyperlinks. cscope lets you do the following very important things: find the definition for this thing (method, structure, #define, whatever). Find all places that are calling something. Find some text anywhere in the code base. Find a file anywhere in the code base. You need this integrated into your editor so you can do all this without thinking - you can be cruising along, hit a reference to an unfamiliar but important looking datatype or method and just hit a few keys and go to the definition, wherever it is. And then pop back. If you're using Visual Studio then this is already built in, as much as I hate VS otherwise. cscope is an easy addition to emacs, I imagine for vi too. As a last resort, stand alone cscope, but it is so much slower than having it in your chosen editor.
        - Add plenty of debugging printfs in areas of code you're interested in. #define a macro for it so you can turn them on or off easily. You can run it under a debugger, but I usually find that takes much longer to step through unless you know exactly what you're looking for already. And with the printfs you will soon develop a feel for what's going on and what values you expect to see. Debugging printfs are like a heartbeat for the code.
        - Take notes in a wiki or whatever you prefer the general structure of the program - mostly which areas of the code do critical things that you're interested in, like common/engine/pp.c contains the paper path motor and encoder logic. Or anything else important you find.
        - Start solving problems. You won't learn the whole codebase at once by zeroing in on a specific issue to fix, but you will learn subsystems fairly well that way. There should be sufficient separation of logic unless the code is hopelessly broken (which is possible). That's the big thing. Don't worry and get paralyzed if you don't understand it all right away, just work on understanding the bits you need right now and eventually you'll build up a picture of the whole thing.

    I realize there are people who are going to freak out at the idea that you would go in and poke at things before you fully understand everything, but unless you have the luxury of unlimited time, that's not an option. Someone up above suggested writing unit tests for existing code, which is good idea in general, but is probably far more time consuming than you have been given time for. Try writing unit tests for the area you are working on right now if you have the time. It's possible the codebase is so broken that the little changes you are making here are having adverse effects elsewhere, but all you can do is try. Eventually as you knock off issues you'll gain confidence and knowledge and before you know it people will be coming to you with questions about the codebase.

  140. break the code by Anonymous Coward · · Score: 0

    break the code :-)

    it's the same as dismantling your dad's radio/car/computer to see what's inside and how it works and re-assembling it , only to find out there is is still one piece left

  141. Re:That's incredibly small by Anonymous Coward · · Score: 0

    Medium size is 250 to 750 million lines of code (one person can still understand how it all works). Big is 1 to 10 billion lines of code. Really big is >10 billion.

    I have worked on code bases of all of those sizes, and I like the medium size the best -- it's big enough to be interesting, and small enough that you can understand it all.

    One that I've worked on (over 25 billion lines) is just too big for my tastes -- over 3 years to do a clean recompile is excessive.

    ---

    Someone always have to be the biggest and the veriest, don't they? ...

  142. codebase I never did understand... by Anonymous Coward · · Score: 0

    Have to share painful past tale. I inherited a ~30K line app that ran on an embedded system of which we had two copies of the hardware, both in production. These were used to do PIN block translations from an acquirer network to the bank/verifier networks & associated security stuff using strange-o NCR security processing equipment plugged into some kind of Intel OEM chassis with an 80186 board running the embedded app, talking to a mainframe over a pair of 48Kbps SDLC links (and the SDLC protocol was part of the app) powered by the i82530 SCC.

    I inherited the code because the author, who was a friend of mine, went home with the flu one day and dropped dead 3 days later -- aged 29.

    Worst part...I couldn't even rebuild the current binary that was in production from the code I found on his PC. But I spent time trying to understand the code base...and it was hard, especially without being able to run it on anything.

    I had a manager who simply wouldn't listen to me until I had printed all the code out -- which was pointless, and I was too stubborn to just print the code out and say...ok, that was a waste of time, now what.

    A middle manager was appointed who came in bursting with enthusiasm. She *did* print out all the code...um, using MS Word as an editor so that it "looked nice" (i.e., appropriately
    girly girly choice of fonts).

    She was very keen and said...I'm sure we can go through this in a morning. Well, I was secretly thrilled when she was on the verge of tears by teatime. We never got on top of that system -- our management woudn't consider my suggestion that we redo the thing on a normal PC & use Linux and change the comms stuff to TCP/IP. But one of the other disgruntled people from the company saw the gap, quit, started his own consultancy & after not too long, showed me the same SP stuff that was remotely managable over X11.

  143. Learn the use cases by lena_10326 · · Score: 1

    I think the biggest trouble is with knowing why things were done. You will look at the code and see that decisions appear to have been made arbitrarily. You'll scratch your head wondering "they had 3 design options but they chose this one, why?". You need to understand the use cases to know the why. It's not always obvious because many times its based on tribal knowledge that was obvious at the time but not now so no one thought to document it.

    Ask around and find out of any of the higher ups from the original project still remain. Setup an interview with them to get the project history and go over the use cases. When you go back to the code, you'll better understand why things were done.

    --
    Camping on quad since 1996.
  144. Source browser program is the solution by Anonymous Coward · · Score: 0

    Use a source browser program and you can easily find thing and understanding code written by others in very little time.

    Here there are some links to source browser:
    http://linguistico.sf.net/wiki/doku.php?id=software_libero:programmazione#browser_di_sorgenti

  145. Re:Large? Try shindex on corpnet by snowgirl · · Score: 1

    A former Windows div Microsoftie says: shindex, baby, shindex! If you don't know what that is, ask the guy or gal in the next office over. And then be prepared to spend the next week or so troubleshooting permissions problems until it works across all versions you care about. But after that, you're golden and there's no faster way to search the source.

    And yeah, I agree with the suggestion of installing SOME *ix toolset. I'm partial to unixutils because they seem lighter weight than some of the subsystem-based solutions like Cygwin or SUA aka Interix.

    Actually, permission rights were never the problem for me. As a build engineer, I had full read and write permission to every windows code base. And, I had a keyword that would override any Product Studio that might be there.

    Basically, I could have checked in a "Hello World" dialog into explorer.exe without any approval from anyone... as a build engineer, one needs that kind of power.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  146. first_developers.provide_rope.shoot_own_foot_with by Anonymous Coward · · Score: 0

    Thar: fixeth'd it fer ya!

    C :

  147. Hu? by angel'o'sphere · · Score: 1

    You've inherited a fairly large (30-40 thousand lines) collection of code ...

    30k to 40k lines of code is not large by any means of measurement.

    A programmer running mad will chill that out in a year or less. But perhaps that is your problem ...

    Anyway as hint of understanding I suggest debugging it. Perhaps you find old bug reports (hopefully fixed meanwhile) and you can try to play them back with a debugger and put nice break points and get an idea. OTOH I fear your program is just old plain C so it might be hard to grasp in debug mode nevertheless.

    Good luck.

    angel'o'sphere

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
  148. I'm doing this kind of work for the last 15 years by Anonymous Coward · · Score: 0

    Many times my job is to help companies that had this kind of problem:

    "We need to fix a bug/add a new feature to this huge code base"

    Most of the time it's few hours max.

    The "secret" of how I do it:

    1. Don't say "Who's that as&$## that wrote this code?" (it'll not help you)
    2. Don't say "Why he code it that way? I could be done with much less code elsewhere" (it'll not help you either)
    3. Your job is to find how to add the requested change while not changing too much code. Always remember: Every line of code that you change = tons of new problems.
    4. Tools needed: Notepad / vi / pico / nano, Windows Explorer Search of XP / 'find' in unix, and the compiler is all I need. For this kind of jobs I'm not spending my time installing IDEs and doxygen.
    5. Last thing to remember before starting to work: Try to avoid adding additional libraries that depending on other libraries / special system features. I'm trying to find open source / free and small code. Pure / close to pure ANSI C/C++ is the best. Few source files - best!
    6. First step: Re-compile everything if it's not take too much time and run the compiled code to check if you have all the environment needed. If re-compile can take too much time (I had a project that taking over 24 hours to re-compile...) compile only the relevant modules.
    7. First thing to do: try to break the code into modules but ignore any module not related to your task. Write a text file with all the relevant only things that you find.
    8. Try to find the smallest change to do on the code. I can be a crazy change but the most important: it must be the smallest change.
    9. Pray that it'll work :) "Good luck and may the force be with you"

  149. Re:Large? Try shindex on corpnet by Kalriath · · Score: 1

    Whaaaaat? Why does the person doing the builds need write access to ANY of the code base? That makes no sense!

    --
    For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
  150. Re:My Dick is Bigger than Your 250,000 lines of co by greg1104 · · Score: 1

    it is a sizeable task, and is the type of topic that few professional journals or books will ever be written about.

    Right, no one has ever written a single book on that topic.

  151. More insights, please! by Anonymous Coward · · Score: 0

    (...) What do you think about intermediate variables that are not strictly necessary?

    Obviously your example is exaggerated, but I wonder the same thing as you. I used to declare too many variables, I think. I read the code from Triplify (just 500 lines of PHP) and it was interesting to see how they did more inline - I think it was neat, but sometimes confusing. I didn't find a balance yet, though.

  152. Re:40,000?!? ARE YOU KIDDING ME? by heson · · Score: 1

    Ooooh yeah.
    Four #includes and a line that starts "int main" is all you need.

  153. Re:Large? Try shindex on corpnet by TapeCutter · · Score: 1

    "Whaaaaat? Why does the person doing the builds need write access to ANY of the code base? That makes no sense!"

    Not sure how you do it, but I tag the source before building it.

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  154. Re:30 to 40 thousand lines isn't large by any meas by Nursie · · Score: 1

    40K Sizeable? Hell no.

    I picked up a one million line codebase with one other engineer. Sure we don't know it inside out, but we're able to work with it. I'm never going to know it like I wrote it myself, but well enough to maintian and add functionality, sure.

  155. codelines/h from BMW by thetinytoon · · Score: 1

    Already stated, but my 2 cents: - use a good IDE with fast referencing possibility (e.g. right-click on a function call => "follow") - use a profiler to see a flowchart or UML for a high level overview - start commenting the classes and refactor their names if unclear. There are nice tools out there (depending on the language), which create DocBlocks for everything first and then you can use DoxyGen to generate a nice overview over everything. And about the question of how bad you are: one of my IT lecturers had worked at BMW and they had made a test on the efficiency of their programmers on new code and on code written by other developers. When the same developer had to extend or change code of other developers, he was a hundred times slower than when he would code on his own himself. That was around 2001, if I'm not mistaken.

  156. Code Rocket by Grimxn · · Score: 1

    Check out Code Rocket - this is what it's for.

  157. Re:30 to 40 thousand lines isn't large by any meas by sproketboy · · Score: 1

    Now I've heard everything. Mission critical and PHP in the same sentence.

  158. Think about it by Sla$hPot · · Score: 1

    You need to

    1. Understand the business rules.
    You need to know what the system / application does before you can begin making changes to the code.

    2. Get an overview of the system design / code structure. If there is any (otherwise it is going to be very difficult).
    Break down the system into use cases and try to see what part of the code each case covers.
    That should give you an idea of the business logic and the class structures (assuming it is not one big bowl of spaghetti).

    3. Create a working document with your diagrams and development plans.
    Put all your observations on a whiteboard, paper or a napkin as needed. But remember to draw it Visio, Word or OpenOffice.Writer too.
    You don't have to do this all at once. It can be done as you move into the code to fix bugs or when making changes.

    It will probably take you between 6 to 18 months to get fully acquainted with 30-40K lines of code.
    It also depends on how hard business is pushing you. The more pressure on bug fixing and system changes, the less time you will have to learn about the system as a whole.
    Even though 30-40K lines isn't that much it is probably more than a one man job.
    If it is a business critical system. It is more likely to be a 2-3 headcount.
    You should have you own exit strategy ready and get out of there, in case business wont take your challenges seriously.

    Anyways i hope they pay you well.
    Good luck with it.

  159. Abstraction by tp_xyzzy · · Score: 1

    reading the code is no good. Instead should learn what their class names and function prototypes are. You can get pretty good picture of the code just by looking at the functions.

  160. Stack overflow by Anonymous Coward · · Score: 0

    I am starting work on an extremely large code base with globally scattered teams. It scares the hell out of me and makes me want to retire even though I have been writing code as an EE for 35 years, mostly real time, hardware centric. This new one is a gigantic GUI based, distributed nightmare.

    I think the tools if not the applications have reached a complexity that challenge the best and brightest. To make it worse, there is a tendency for less mentoring and training. The entire prospect of multi-tasking between complex products, regularly switching between products is inefficient because you tend to lose focus. Management expectations are untenable as the bug count exponentiates. The entire profession needs to step back because the limit of human capability has been reached with this paradigm.

    I even found a bug in slashdot as I was typing this missive !!

  161. Re:30 to 40 thousand lines isn't large by any meas by Hal_Porter · · Score: 2, Informative

    Source Insight lets you browse source code - very useful for largish codebases. It's much quicker than findstr or grep because it has an index rather than having to search the whole thing. It's not free of course but I'd never go back to findstr having used it.

    --
    echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
  162. Re:Large? Try shindex on corpnet by snowgirl · · Score: 1

    Whaaaaat? Why does the person doing the builds need write access to ANY of the code base? That makes no sense!

    First, the build tools are in the code base... we're not just running "make" here, there's a hojillion scripts doing a hojillion things every which way...Windows goes through a crazy amount of pre and post processing...

    Next, this is Windows... it's a critical build... the build MUST be pushed out every day, and include as many checkins as possible.

    Someone pushes out a checkin that breaks the build... 100 other people made checkins and their code didn't break the build, and they need to test their code now... We can't just say, "sorry, build broke, we're scrapping, Person XY needs to fix the break, and then we'll start again." Because the build takes 14 HOURS!!!

    So, it's the builder on duty's job to revert the checkin and then restart the build, hopefully, you will have enough time to make the build finish by 9:00am tomorrow, when people start arriving at work.

    You may be happy with your 3-4 hour compiles, and builds that can sit around broken, because 100 people aren't depending upon you for that build... meanwhile the real build engineers have to deal with serious shit.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  163. Re:Large? Try shindex on corpnet by snowgirl · · Score: 1

    "Whaaaaat? Why does the person doing the builds need write access to ANY of the code base? That makes no sense!"

    Not sure how you do it, but I tag the source before building it.

    Windows also has a bunch of metadata that each build generates along the way, and this metadata gets checked in during the build process...

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  164. A few times by Anonymous Coward · · Score: 0

    I've taken over some code projects of the 10,000 - 100,000 line range. Usually they'd gone through several hands at that point and I think that's the hardest part. I mean, if you have code written by one person you get used to their syntax and style. When th code has been passed around, you run into a lot of difference approaches to problems.

    My advice would be to:
    a) backup the code when you first get it. That way if you screw up, you can go back to the original.
    b) read through the code for a while without a mind to change anything. Just get used to reading it and see if you can figure out what's going on
    c) decide to change something minor. Something completely trivial and then do it. Any really little, pointless thing. It'll teach you how to find details in the code.
    d) Do NOT change a bunch of stuff because it "the right way" to do something and the code is doing it "the wrong way". Often times there are things in old code projects which work in a fine balance and you don't want to change them. You may regret it later.

    One thing enjoy about taking over older code bases is learning how people used to do things. I've looked at code ported from old UNIX or DOS boxes and it's interesting to see how they got around memory or file restrictions. Definitely a great learning opportunity.

  165. Re:30 to 40 thousand lines isn't large by any meas by Anonymous Coward · · Score: 0

    'I am currently working with a mission-critical codebase, which is written in PHP..'

    Sorry, I'm sure you have valuable points to make, but I stopped reading at this point because I was laughing too much to continue.

    Mission Critical. PHP. *wipes eyes, sighs*. Good one.

    You have my deepest sympathies.

  166. Reengineering Patterns by h2o2 · · Score: 1

    The freely available book "Reengineering Patterns" (http://scg.unibe.ch/download/oorp/) contains practical advice and shows systematic ways to tackle these situations froma variety of angles. Without knowing more details about your problem it is hard to recommend concrete steps, but _do_ read the book in any case.

  167. Asked before. Answer: use tools ... by golodh · · Score: 1
    Approximately the same question was asked before and received (in my opinion very good) answers in this thread: http://ask.slashdot.org/askslashdot/08/01/18/1554257.shtml

    Specifically limiting yourself to "reading code" and relying the likes of "grep" is (as far as I'm concerned) behaviour of a Code Monkey, not a Software Engineer.

  168. I just completed something similar... by mswhippingboy · · Score: 2, Interesting
    I inherited a 15+ year old application about two years ago of similar size written in C (actually Pro*C) that had a long history of crashes (invalid pointers), memory leaks and incorrect results. I was tasked to add additional functionality to the application. I was able to implement the additional functionality, but because of the requirements of the project, I did not address all the structural defects in the application.

    As a result, although the new functionality worked fine, the application still suffered for the "spaghetti" code of patches upon patches of years of various developers adding additional capabilities, but no one ever addressed the reliability of the application. The support group for this application was clearly frustrated with years of late night calls and hours and hours spent trying to correct errors.

    About 6 months ago I was tasked with essentially "cloning" the application for new business purposes. I proposed porting the application to a newer, more modern language (java). It took a lot of selling (i.e. convincing management and other developers that the end result would run just as fast, be easier to maintain and have more reliability), but I was able to get them to buy off on it.

    The rewrite was completed about 3 months ago and the results were better than i had hoped for. I was able to complete the rewrite in the same amount of time allocated for the original "enhancement" project. The application actually runs faster than the old one, has yet to crash (it runs 24x7), and the code is well structured and easy to maintain. We're now in the position that if/when another "enhancement" is requested to the old application, we can simply clone the new java version and completely replace the old app. Given the results of the last project, it won't be a hard sell (especially to the support group) to go the java route.

    I know this is a long post, but the bottom line is that sometimes (more often than many realize), recoding an old application in a modern language and bringing it into the 21st century rather than patching old code can pay off dividends beyond the basic added functionality.

    --
    Sometimes the light at the end of the tunnel is the headlight of an oncoming train.
  169. A suggestion by Uzik2 · · Score: 1

    This will kill two birds with one stone. Write unit tests for the codebase. You will learn the code and learn what it's supposed to do well while you're doing it. Further you'll be in a better position to make changes without breaking the existing functionality.

    --
    -- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it
  170. Have you tried Krugle? by ClosedLoop · · Score: 2, Interesting
    About a year ago, I took advantage of my employer's "Innovation" program to promote our internal use of a code-search tool called Krugle. I took point in contacting Krugle, arranged for a free demo period, and administered the demo on a machine in our network. Of course, I fell afoul of the "Innovation" program, because my version of "Innovation" was something to help us develop a better product. In fact, the program was intended to find a better color for the box, so my Krugle effort was lost on them, but hey I'm not bitter.... Ok, on to the point. I got a dozen developers to participate in the evaluation of a Krugle copy running inside our firewall. It indexed millions of lines of legacy code, organized across a dozen different projects. In my opinion, and I believe the majority of other evaluators as well, being able to search our code exhaustively was a major benefit in getting "arms around" the code base. It changes your outlook. You start asking questions like
    • Where are all the places that a different component calls this API?
    • What the heck does error code 4872339 mean, and who generates it?
    • How many derived classes override this virtual function?

    If you surf on over to Krugle.com, you will see that they now offer a free evaluation copy as a standard product. If you want to get a feeling for what can be done with the tool, just check out Krugle.org, where lots of open-source projects are indexed online. I would definitely recommend using the free evaluation tool as a way of speeding your high-level understanding of any new-to-you code base.

  171. Re:Large? Try shindex on corpnet by Drakino · · Score: 1

    As a fellow build engineer, I always find it interesting to hear about the processes at Microsoft. One of the books I read prior to taking my first build position was "The Build Master: Microsoft's Software Configuration Management Best Practices" by Vincent Maraia. It was interesting to read about the type of processes that come out of a build that does take 14 hours and has hundreds of people working on the codebase.

    One of the concepts I liked quite a bit was "The Gauntlet". I can't remember if this was used on Windows, of if it was specific to the Visual Studio team, but it was pretty slick in detecting what change actually broke the build. Though I heard the system would get backed up from time to time causing lots of delays.

    With the amount of large code bases Microsoft, or other companies maintain, it still surprises me how primitive most build systems are. Only recently have companies started to release build specific products, most only suitable for small codebases, or built for java/web development environments. I guess the problem is that large products are pretty unique in their build requirements. I work in the games industry, and most of our code build times are measured in minutes these days when the proper hardware is thrown at the problem along with distcc/incredibuild. The time consuming processes tend to be more related to game content now, things like lighting levels, or generating AI pathing information.

  172. All you need is tags by loufoque · · Score: 1

    I just started a few months ago a job where I'm maintaining an old embedded system (an isdn gateway, old technology) that is supposedly written in C++, but is actually bad C.
    It has no comments and no documentation of any kind. Indentation is broken beyond repair. A lot of functions are several thousand lines long, while most files are in tens of thousands of lines.

    All I needed to deal with it was generate tags. Once you've got the tags, you can jump to a declaration or definition easily anywhere inside the code base. That, combined with grepping all the files of the project for the right strings or regular expressions (the system does a lot of logging, so I can just grep for the log message to find the relevant piece of code), makes the job doable.

    But then, it's still a boring job with little opportunity to shine. I'm personally leaving whenever I can afford to move again.

  173. tests by bytesex · · Score: 1

    You find out what it's supposed to do according to functional spec, and you write a test-suite against it. Two birds with one stone.

    --
    Religion is what happens when nature strikes and groupthink goes wrong.
    1. Re:tests by JustNiz · · Score: 1

      Well that just tells you what the black box does and what it is meant to do. It does not tell you anything about how it does it, which is what the OTA needs to know.

  174. no reason to be discouraged by alonsoac · · Score: 1

    You shouldn't feel bad about not understanding easily all parts of a large code base. I've been programming for over 10 years and there are some systems that are still in production for more than 7 years and I am still in charge of maintenance. When I have to go back and change something it is very difficult, it is almost like someone else programmed it and it is tough to remember how things work. The problem is not that I am dumb now, the problem is I wasnt as good a programmer then and there was no budget/time for decent documentation.

  175. Re:Large? Try shindex on corpnet by Nutria · · Score: 1

    meanwhile the real build engineers have to deal with serious shit.

    Real Engineers use conditional compilation so they don't have to recompile every single stinking row of code every night.

    --
    "I don't know, therefore Aliens" Wafflebox1
  176. Start at the Beginning by frankj2k10 · · Score: 1

    I haven't read through all the posts and there are some great suggestions and strategies that have been outlined.
    I've been through the same situation quite a few times in my career.

    Have you been able to track down any of the project artifacts developed as the software was being created.
    Business requirements, functional requirements, use cases, design docs, database designs, user guides, etc.

    I know these documents, if they exist, can be out of date, incomplete, or puzzle pieces for how the software has evolved over time.
    However, what may exist might be able to provide a high level picture of the software from different perspectives and shed some light on little nuances.

  177. Step cautiously! by ZeLonewolf · · Score: 1

    Just last summer I took over a project with over 250,000 lines of code. It was a complete disaster of a codebase, a total Rube Goldberg machine... but somehow, after years of poking and prodding and band-aids and what-not, it WORKED...however, even the tinest code change too weeks to happen because the code was so badly written. The project had a ton of turnover through the years, and from the looks of it many of the coders use conventions from different languages they were familiar with, copy/paste all over the place, bad structure, fragile inheritance schemes, etc., etc.

    So, I did the only thing that made sense. Started completely from scratch, picking out the parts that were usable as we went. We haven't finished yet, but I haven't looked back...

    --
    "If at first you don't succeed, lower your standards."
    1. Re:Step cautiously! by Ritchie70 · · Score: 1

      I have to quarrel with the notion that starting from scratch is the only thing that makes sense.

      Those "tangles" are also known as "bug fixes" and "enhancements."

      Suppose it's a program for scheduling employees. The weird line of code that says "add 1 hour to the schedule if it's a Tuesday and the guy in station 3 is rated below 17" is called a "business rule." If it was put in 10 years ago, the guys who put it in had a reason, but nobody is going to express that reason to you in requirements gathering, and your test group isn't going to find it. But your end users are going to open a defect that says "Tuesday schedules don't seem right any more."

      I'm a big fan of refactoring not rewriting.

      The system I work on at the moment has something like 3 - 5 million lines of code. Its development started in the mid-1980's. In C. Mostly K&R C. In an environment that had very limited memory by modern standards, so it does weird things like use shared memory (that it isn't sharing with anyone) if it needs a big block of contiguous memory. And the stack must have been limited too, because there are globals everywhere.

      In the code base are specialized compilers for building other parts of the system that describe the user interface in specialized "screen files." It's a nightmare sometimes.

      There was an attempt to port it from its current version of Unix to another Unix variant. It failed.

      There have been multiple attempts to replace it with other systems. They all failed.

      Because this application is the codification of the operations manual for our business in C. Literally. With changes over the last 25 years as business processes changed.

      Since you will never, ever have a full set of requirements that describes what it really does, no replacement will ever live up to it unless you analyze everything it's doing and build based on that.

      If you're going to do the work to analyze everything it's doing, you might as well just incrementally refactor what you've got.

      --
      The preferred solution is to not have a problem.
  178. Re:30 to 40 thousand lines isn't large by any meas by pclminion · · Score: 1

    use revision control, and don't trust it -- that is, back up incessantly.

    Dude... Find a better revision control system.

  179. ++i can generate better code than i++ by Anonymous Coward · · Score: 0

    It's true. You'd think the compiler could just do the same, but it can't, not always. i++ can have consequences. ++i never does. But frankly, if you aren't doing billions of these a second...

  180. Source Insight Code Browser by Anonymous Coward · · Score: 0

    I work at a major software company with millions of lines of code in our software repository. A lot of the developers here favor Source Insight www.sourceinsight.com/ It is an excellent code browser for complex code bases.

  181. Re:30 to 40 thousand lines isn't large by any meas by digitalunity · · Score: 1

    Just FYI, you wouldn't need cygwin anyway. There is the minimal GNU system for windows, which is a native port of some basic GNU tools to windows.

    I use it quite a lot with mostly satisfactory results.

    --
    You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
  182. Re:My Dick is Bigger than Your 250,000 lines of co by BlueBoxSW.com · · Score: 1

    few (fy)

    adj. fewer, fewest

    Amounting to or consisting of a small number: one of my few bad habits.

    Being more than one but indefinitely small in number: bowled a few strings.

    n. (used with a pl. verb)
    An indefinitely small number of persons or things: A few of the books have torn jackets.

    An exclusive or limited number: the discerning few; the fortunate few.

  183. Re:30 to 40 thousand lines isn't large by any meas by Garridan · · Score: 1

    There isn't one. All software has bugs. Even your revision control system. Don't trust it. Make backups. Redundancy is the only way to be safe.

  184. Cybertao by tgrigsby · · Score: 1

    Okay, first off, 40k lines isn't big, and unless they did a really horrible job of naming and organizing the parts, it shouldn't be hard to tackle. I'm dealing with a 1.5 million line assortment of legacy code, and while it's taken a while to suss out, I have a pretty decent grasp of where everything is.

    It's a Zen thing. You look it over until you identify the top level units, then work your way down. Most applications have a framework. If you can't find a starting point, figure out which code is the most outward facing, read through the high level functions, and dig downward. Make notes. Look for comments along the way. If you don't see comments, write some. Absorb. No one understands any system immediately, if you become one with the code, you too can be a master.

    Nowadays, I'm the architect, and while there's still more code written before I got there 5 years ago than since I arrived, I understand just about all of it. I've been programming for 20 years, and this makes the fourth time I've started with a million+ line system and ended up being one of the experts.

    Patience, grasshopper.

    --
    *** *** You're just jealous 'cause the voices talk to me... ***
  185. Re:My Dick is Bigger than Your 250,000 lines of co by greg1104 · · Score: 1

    Results 1 - 10 of about 1,070,000 for "legacy code"

    Let's me preempt your next comment: "but how many of those are 'professional journals or books'?". Well, 2,640 of those are from the journal of the ACM. That's just a bit more than few now, isn't it? Looks like you have some reading besides dictionary.com to do.

  186. Re:30 to 40 thousand lines isn't large by any meas by GryMor · · Score: 1

    If you don't have a repro case for a problem, you are getting way ahead of yourself trying to fix it, as even if you fix it, you won't KNOW that you've fixed it.

    Without tests (and note, I did not specify automated unit tests, those are handy and speed things up, but I personally prefer end to end integration tests when dealing with a system I didn't write) you can't figure out how a system is intended to work, at which point understanding how it does work usually isn't helpful, and can actually be harmful as you internalize a model of how it does work as how it should work. It hides bugs from you, and often leads to your internal model being horribly flawed (from the perspective of what the program should do).

    Does your boss have expectations about what the system does? If so, and if they tell you those expectations, you have tests. Sure, they are the manual integration kind and probably underspecified, but it's a starting point.

    And yes, tests make a code base easier to learn as they give you a something to trace through and a basis for reasoning about how the code base should work. Fleshing out and automating those tests refines that understanding.

    --
    Realities just a bunch of bits.
  187. CXT C Exploration Tools by northerner · · Score: 1
    When learning someone else's code, I have found Juergen Mueller's C Exploration Tools to be very useful.

    The CFT utility (C Function Tree Generator) provides a summary of the functions and calling hierarchy.

    The CST utility (C Structure Tree Generator) gives a summary of the data structures and how they are nested.

    I don't think these utilities have been updated for quite a while. Can anyone suggest more modern versions of tools that do similar code analysis and reporting?

  188. Re:30 to 40 thousand lines isn't large by any meas by Rene+S.+Hollan · · Score: 1

    Yeah: that's a use of the word large with which I wasn't familiar.

    Break it up into what appear to be the logical sub-components, test by making libraries, and seeing how things link together and headers are included, until you have sufficiently manageable pieces.

    But, even in the aggregate, one or two read throughs should get you a "feel" for the code.

    --
    In Liberty, Rene
  189. Re:30 to 40 thousand lines isn't large by any meas by paylett · · Score: 1

    Size is relative. A well-organized, commented, documented, 200k-300k program is by no means large. Even for a single newstart developer. But 10k-20k of, say, badly written Perl might require a few sessions of therapy afterwards.

    --

    Believing something doesn't make it true. Not believing something doesn't make it false.

  190. Re:My Dick is Bigger than Your 250,000 lines of co by BlueBoxSW.com · · Score: 1

    No, my next question would be:

    "Is hanging out on Slashdot looking to cherry-pick a phrase out of context for the sole purpose of telling someone, anyone, that they are wrong, a lonely life?"

    I'll stick with my opinion that the submitter's question was A) A good question, B) Worthy of honest response and discussion, C) Germane to an area that gets less coverage than it deserves.

    And that your response added nothing worthy to the discussion.

  191. Read And Love by Anonymous Coward · · Score: 0

    Check out the excellent article Code Spelunking Redux: Is it getting any easier to understand other people’s code?, and learn to love Doxygen and DTrace (if your language is supported).

  192. May be of help: FAMOOS Reengineering Handbook by Anonymous Coward · · Score: 0

    May I suggest reviewing the FAMOOS Object Oriented Reengineering Handbook. Ignore the
    the age (1999) and consider the approaches.
    FAMOOS Handbook: http://scg.unibe.ch/download/projectreports/FamoosHandbook.pdf

  193. Re:30 to 40 thousand lines isn't large by any meas by ScrewMaster · · Score: 1

    I feel like I'm feeding a troll here, but someone mod'd this up. So someone actually thinks you were saying something worthwhile, and I just don't see it.

    On the other hand, some people just are that lucky, and never have to wade through three or four feet of someone else's leftover muck.

    --
    The higher the technology, the sharper that two-edged sword.
  194. Re:30 to 40 thousand lines isn't large by any meas by jimrthy · · Score: 1

    I was thinking more of the perennial "It works fine on my machine" problem.

    Then a tester (or worse, a client) installs it, and there's some Terrible Thing that happens pretty much at random that you don't have any way to get enough information about to reproduce.

    We had a case a few years back where, every 2 or 3 months, all our machines at one client would quit responding to input. They'd have to shut down production, hard reset the server, and then all the clients.

    We spent months trying to repro this in-house (the client was in France, so flying someone out to their site just wasn't in the budget...although wasting those months probably cost more in the long run.

    We finally narrowed the problem down to e-m interference between some machine they only used about once a month (so it still didn't happen every time) and our wireless network.

    The solution, which took one guy a weekend, was to switch our communication protocol from TCP to UDP.

    It's kind of hard to predict test cases for that sort of thing.

  195. Re:30 to 40 thousand lines isn't large by any meas by ScrewMaster · · Score: 1

    I could not disagree anymore with your statement.

    If you're not disagreeing anymore, presumably that means you're agreeing. Or something.

    --
    The higher the technology, the sharper that two-edged sword.
  196. Re:You are an idiot by ScrewMaster · · Score: 1

    I am article submitter O.P. and not retard I am programmer with Master DEgree in Computer Science from Indian Institude of Technology and If I am retard why does IBM give me 40.000,00 lines of code? American IBM cannott do it so they give it to me because of my education in India IBM paies me 2 Mexican paysos for every line of code I fix that American coder screw up and I need food and room like American does. If American wants money than American should do job correct the first time and not have to send it to INdia to get all the work done correct. As AMerican teenager say DONT HATE THE PLAYER HATE THE GAME

    The good news is that he's not a technical writer.

    --
    The higher the technology, the sharper that two-edged sword.
  197. Re:++i can generate better code than i++ by ciggieposeur · · Score: 1

    When on separate lines (or as separate expressions like 'for (... ; ... ; i++)') they should compile to the same code for C and Java.

    They compile to different things in C++ for non-primitive types when operator++/operator-- are defined, such that pre-increment has a (slight) performance gain.

  198. Re:30 to 40 thousand lines isn't large by any meas by pclminion · · Score: 1

    Of course you should make backups, but sitting around gnawing your fingernails in terror isn't really necessary.

  199. Re:30 to 40 thousand lines isn't large by any meas by Z00L00K · · Score: 1

    Assuming that you only use the basic tools, but when you are a *NIX nerd you will soon get accustomed to a lot of the other tools too.

    --
    If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  200. Re:30 to 40 thousand lines isn't large by any meas by Garridan · · Score: 1

    Ahah, you see my point! I don't gnaw my fingernails in terror because I back up regularly!

  201. Learn only what you need by softegg · · Score: 1

    I used to port Japanese RPG games into English for Working Designs, which were similar if not larger code bases. All the comments were in Japanese, and frequently many of the tools used to build the product and assets were missing.

    The way I dealt with it was to only focus on the problem I was trying to solve, and not worry about the rest of the code. The poster who said to backup the code in a VCS was right on... once you know you have a stable base to go back to, you can try all the changes you want.

    If you approach the code with a goal, you can then think about likely places where that code would be. Grep is your friend. If the code has embedded strings, you can search for those strings. Otherwise, you can find the handles for those strings, and search for those. If it is some sort of I/O or database access, search on those call names. Frequently there are naming conventions that you can learn and use to find stuff.

    The idea the earlier poster said about putting breakpoints in and/or stepping through the code was right, this is also a very useful practice. It is much easier to follow the flow if can step through it, especially with C++, where inheritance can often leave one baffled as to which code will actually run.

    Following main (or your language equivalent) and then drilling down is sometimes useful, but it is often easier to find the bottom and work your way back out.

    Watchpoints can be really useful. Find a variable with a value that interests you, and put a watchpoint on it so that it will break when that memory is accessed. A great way to see which routines are involved.

    If all else fails, pepper the code with print (or logging) statements and see what shows up. Try to narrow down what you are looking for.

    Another useful technique is to comment out a section of code and see where the compile breaks to find dependencies.

    As you figure stuff out, add comments. Perhaps also keep a file of notes when you find stuff or figure out how things work.

    As long as you are focused on solving a particular problem, the code base isn't so unreasonable, because you don't care about most of it. As you knock down each problem, you learn a little more about the structure of the code.

    Remember, programming is the art of breaking problems down into smaller problems until they disappear.

  202. Re:Large? Try shindex on corpnet by Kalriath · · Score: 1

    We actually have the developers tag the build after checkin of any completed defects.

    Granted, we don't have a dedicated build person, but if we did they wouldn't have checkin rights to the codebase.

    --
    For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
  203. Re:Large? Try shindex on corpnet by TapeCutter · · Score: 1

    Heh, I'm a developer, the cvs "gatekeeper", and the primary builder in a shop of 20 odd programers.

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  204. Re:Large? Try shindex on corpnet by TapeCutter · · Score: 1

    PS: Developers also use change tags but I was talking about the build tag.

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  205. Re:Large? Try shindex on corpnet by TapeCutter · · Score: 1

    Why would you want to archive stuff that can be reproduced by a build?

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  206. 40k isn't huge, but it's not small by Anonymous Coward · · Score: 0

    Sorry, but all the people giggling saying 40k lines of code is "small" are just hothead braggers. Aside from maybe the final graduation project or program for your undergraduate thesis, how many of you with CS or CIS 4-year degrees ever had to write even close to 40k while in college? None of ya. Only to then graduate, work for a while, and then act like anything under a million lines is small? Whatever.

  207. A decent IDE and/or Doxygen by Anonymous Coward · · Score: 0

    If your particular language is supported an IDE can provide very easy navigation on the nagivation. You can branch through the code.

    And doxygen can generate call trees and stuff which can help you.

    Of course, the nature of your target language should support these (i.e. statically and strongly typed languages).

  208. Re:My Dick is Bigger than Your 250,000 lines of co by greg1104 · · Score: 1

    So your opinion is that my suggesting two books to read, followed by notes on how to find the extensive academic library on this subject, added nothing worthy to the discussion? Interesting.

    I corrected one point in your otherwise useful commentary, admittedly in a somewhat snarky fashion--that was meant to be a bit of a joke by the way, which you didn't take well. The cherry picking out of context started when you decided to pick on one word I used, rather than considering that perhaps my quick spoof suggesting literature in this area was just alluding to a larger issue in how you described it. You don't quite seem to have gotten that still; "less coverage than it deserves" is just not a defensible position, given that there are in fact two major books and thousands of research papers on this very specific topic.

  209. That depends... by treczoks · · Score: 1

    One of my first inheritances at work was about 40k lines of code. One big ass file. Assembly, so I had no chance with Doxygen. Lots of macros to build totally different things depending on definitions and phase of the moon. Half of the comments were in french, which is erm, french to me. And each and every bit of RAM in the target was in use. "You only need to add this little feature." And be quick, because the system is already sold. And it is overdue, too, because someone in sales forgot to place a development job for the "small" change.

    And now tell me that 40k lines is small and easy...

  210. Re:My Dick is Bigger than Your 250,000 lines of co by BlueBoxSW.com · · Score: 0, Troll

    Two links to the same book doesn't equal two actual book links.

    Perhaps rather than using the word "few" I should have calculated whatever small percentage of books are available on the topic.

    But then again, I have a life.

  211. Re:30 to 40 thousand lines isn't large by any meas by GryMor · · Score: 1

    In my experience, the client (software, hardware and wetware) must be considered part of the repro case until demonstrated otherwise. I don't know how many bugs we've tracked down to interesting browser behaviors when certain windows accessibility features are turned on.

    I will admit that as I do web and infrastructure development, I probably have a leg up on those doing traditional software deployment.

    --
    Realities just a bunch of bits.
  212. A few thoughts by jgrahn · · Score: 1
    A few thoughts (in no particular order), without having read many of the other comments.
    • Accept that it will be painful at first, and that you'll feel stupid a lot of the time.
    • The original developers probably didn't 'get' all of this code at any one time, either. I'm the "original developer" in such a scenario right now, and I cannot always answer detailed questions in any other way than "I wrote it like that, saw that I could trust it, knew that I could go back to it when needed, and then chose to forget the details".
    • Find a personal channel to the requirements and/or users. If you don't know what the code is supposed to do, you're fscked. And I don't mean what some paper says it should do, but what the *users* expects it to do. (I'm assuming there is a user base, and you're job is to push changes to them.)
    • Don't piss off those users. Be their friend, listen to them, understand their needs. In return you can get things like better bug reports, better help testing new features, and a big motivational factor.
    • Basic hygiene, if the original developers missed it: turn on all compiler warnings, use tools like valgrind on the code. Fix the build system if it sucks.
    • Learn tools to navigate the code *efficiently*, if you don't already. For me they are the Unix tools: emacs, find, grep, perl, nm and make. Probably others work on other platforms and for other languages. I've never tried any new-fangled tools for that, and doubt they work well, but your mileage may vary.
    • After working with the code for a while, you'll see patterns in it. So that when you fix a certain feature, you'll see which subsystems probably aren't involved and which ones can probably be trusted to do their job (even if they suck).
  213. some steps I'd take by multicsfan · · Score: 1

    There are many things I'd do and some are dependent on the language as some things make more sense in some languages and less in others.

    First thing I'd do is get all the existing documentation I can find including the end user documentation of how to use the software.

    I'd next try to break the software down by modules, subroutines, functions, library routines, etc. to get an idea of what does what. I'd also try to determine variable usage, such as local vs global variables and where things are defined.

    If the above is not already documented I'd work on creating the documentation so I don't have to refigure things out each time I dig into the code for something.

    The code style of the previous people who worked with the code can be very important. Some languages are easier to write obscured code in then others. If the code is NOT documented or the documentation is obsolete I'd start working on the inline documentation. Anyplace that the code is very obscured or poorly written I might look into rewriting so the code is easier to document and easier to read.

    Don't trust any of the documentation until you've made sure it is up to date.

    At one of my jobs the package I was hired to maintain, support, and enhance had been modified on a per customer basis where some varialbes had different meanings in different versions. There where some features where the feature was implimented differently in different systems to meet different customers differing and conflicting needs. In some cases the mainline module code would look the same but the differences would be hidden in the subroutines. This was made even more complicated by being a multiuser application that did its own file locking. The original application had been single user so there was more then one method of gdoing file locking, some of which was based on what files where in which 'partition'. The system only allowed locking entire 'partitions' at one time. As customer grew to need multiple disks with multiple partitions the multiuser locking would erratically fail, corrupt data or deadlock, etc.

    Look for the tools people mentioned that can help you easier figure out how things work. There were no tools for the system I worked on so I had to create my own (proprietary non-standed OS and interpreted language).. My boss complained aobut some of the time I spent working on the tools until he saw how they were saving time and helping make it easier to make changes.

    Don't be afraid to look for tools to make your life easier. Don't be afraid to write your own if there is a good reason to do so.

    The system I worked with was about 200 programs / customer with about 200 subroutines (sometimes unique for a customer) in each system.

    1. Re:some steps I'd take by Ritchie70 · · Score: 1

      Only thing I'd say about building your own tools is that the tools for understanding the code don't have to work on the platform where the code is used.

      There are generic tools for standard languages; just put a copy of the code on a Windows system and run the tool there.

      Unless the code has done clever things like have files named "source.c" and "Source.c" in the same directory, it should work out OK.

      --
      The preferred solution is to not have a problem.
    2. Re:some steps I'd take by multicsfan · · Score: 1

      In the proprietary system I used the BaSIC language was burned in ROM as the OS, no other languages. I've worked on other systems that did not have a C compiler or were sufficiently unique that the 'generic' tools would not work. I've even seen some generic tools blow up due to some of the bad programming used and/or not be able to figure out what was going on.

      I'm not saying that the person should always write their own tools. If the generic ones aren't available or don't work for whatever reason, writing some of your own to help can be a bonus in the long run.

      Sometimes having tools that know more about your specific environment can be more helpful. Sometimes the information you want isn't something available from a generic tool.

    3. Re:some steps I'd take by Ritchie70 · · Score: 1

      I understand what you're saying - sometimes it's a unique environment. Embedded BASIC is one of them.

      But most people's work is a lot less unique than they want to think it is.

      In my opinion, if a good, quality C-language analysis tool blows up on your code, you might want to have a look at what's making it blow up and make sure your code is correct.

      It may be a necessary weirdness in your code due to compiler bugs on the platform or other quirks. In that case there's probably something you can do to make the tool not blow up (like remove the weirdness in the analysis copy.) Or it might be the bug you were looking for a year ago.

      --
      The preferred solution is to not have a problem.
  214. It just takes time by wesw02 · · Score: 1

    I don't really have a good answer to the problem. I just graduated college 4 months ago, got hired to work on a code base that is around a million lines of code and it's not easy. If your lucky the code maintains a certain amount of consistence and you find a lot of similarity in various objects/modules. I found that spending some one on one time with my debugger (gdb) has really helped me to get a handle on the structure (request/response socket classes, model/view controllers, cache dbs, etc). Patience is your best tool.

  215. Spaghetti Code! by Anonymous Coward · · Score: 0

    http://sourcemaking.com/antipatterns/spaghetti-code

  216. Re:30 to 40 thousand lines isn't large by any meas by digitalunity · · Score: 1

    True, it does have awk, sed and a few other handy utilities but it doesn't include other things you might like, such as python.

    --
    You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
  217. Golden rule by LostMyBeaver · · Score: 1

    A good measure of a programmer's competence is based on measuring two temporal differences.

    First, measure the time it takes from when he starts the job until he makes the comment "I really think we need to rewrite this".

    Second measure the time it takes from the first point until he realizes that it's better to maintain what you have since it's too big of a job and even if it were "done properly", deadlines would screw up the new codebase as well.

  218. Re:Large? Try shindex on corpnet by snowgirl · · Score: 1

    meanwhile the real build engineers have to deal with serious shit.

    Real Engineers use conditional compilation so they don't have to recompile every single stinking row of code every night.

    Real codebases actually have this dependency information written out so that one can do incremental builds. The Windows codebase however does not have such information declared. Any change could potentially affect anything else in the build.

    Again, I already stated: the Windows codebase and build process is not a "product" and thus does not receive the attention that it should get for shine and polish.

    We had one guy working on building an accurate dependency graph when I left... he was trapping the syscalls to report which files each compile was using, and dumping it into a large database (we're talking over 4GiB).

    Again... spaghetti code is enormously difficult to maintain, and when that spaghetti code is Windows, one cannot simply dump months or years of work into sidetracking to make things work the way they're supposed to.

    I mean, that's why Windows Vista was so late. x86-64 came out and Microsoft basically said, "porting WinXP to x86-64 will cost us just as much money as porting Server 2003 cost... let's just dump the work we had already, and start from Server 2003".

    I totally agree with you, and as I already stated the Windows codebase is a horrible mess... but we had to make it work... that's what they paid us for.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  219. Re:Large? Try shindex on corpnet by snowgirl · · Score: 1

    Why would you want to archive stuff that can be reproduced by a build?

    The things like the publics need to be available to the other build machines in the distributed system. The easiest way to do that was to check it in, and then push the data back out.

    Recall, we're talking about just maintaining code... I didn't write it, and so I don't really know everything that was going on worked. I had a good overview of what was going on, and some deep introspection into some very limited areas (namely those that broke).

    Honestly, there was a ton of retarded shit that went on in the code base, and I would have done it entirely different... but I wasn't the original desginer, I was just a maintainer.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  220. A problem for large code bases by Xest · · Score: 1

    That's certainly a decent option, but obviously for large codebases with many different combinations of actions that can be performed it may become unweildy.

    Personally, I'd argue one of the best things someone can do to help themselves in this situation is to learn design patterns, and learn to recognise them.

    Even if people don't specifically follow design patterns, they often do so unintentionally, because this is really the beauty of design patterns- they are common solutions to common problems. If you can start to recognise design patterns, then you find you are no longer looking at lines and lines of code, but you are looking at the bigger picture, beginning to see what sections of code do in general, and can then get to grips with the role of these more abstract components in the larger system and understnand how it works.

    You will still have to figure out how the algorithms in each component work, but you should at least be able to understand how those components fit in the bigger picture and their effect on the system as a whole.

  221. Re:Large? Try shindex on corpnet by TapeCutter · · Score: 1

    "The things like the publics need to be available to the other build machines in the distributed system."

    Ahhh, I missed the distributed part. I've worked on some very large systems that took hours to build but I've never actually come across a distributed build during my 20yrs in the industry. The system I'm looking after at the moment builds win32, x64 and ia64 all on the same box, it uses a single python script with a cvs tag as a paramter to do the lot. The *nix builds are also kicked off from a single makefile/tag but run on seperate boxes for the half a dozen flavours we produce.

    "Honestly, there was a ton of retarded shit that went on in the code base, and I would have done it entirely different"

    A wise man once told me that source code is like shit, everybody else's stinks. ;)

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  222. Reverse Engineer to UML class diagrams by Anonymous Coward · · Score: 0

    And once you're done with that, create UML sequence diagrams of various use cases. Make sure to use a debugger to create the sequence diagrams. After you're done, you'll know parts of the code better than the original authors.

    Have fun and good luck!!

  223. 30k to 40k is not big by Anonymous Coward · · Score: 0

    I worked at Microsoft and several other large corporations.

    Let me tell you: Anything less than 1 million lines of code is small.

    You just need the right tools in order to handle large code bases.

    First things to do:

    1. Use source code version control (CVS, Svn, Git, Hg). Perform code reviews before check-in.
    2. Use automatic compilation (make, ant, maven).
    3. Use a build machine that pulls the source code from the Svn machine and builds it automatedly through ant or maven, sending email in case of failure, to detect compilation problems early.
    4. Use an issue tracker to remember things to do and to keep track of time.
    5. Use automatic unit tests (Junit and the like). Unit test every single method for all border cases. A test failure is a build failure and should be looked at immediatly.
    6. Refactor mercilessly once everything has been unit tested. Avoid repeated code like the plague. Repeated code makes maintenance difficult, if not impossible.
    7. Use AOP or proxies for logging and security.
    8. Make sure no package has more than 10 classes, no class has more than 10 methods, no method has more than 10 lines. Refactor, refactor, refactor.
    9. Make sure no class has more than 3 instance variables, no method has more than 3 parameters and no method has a loop AND a conditional sentence.
    10. Make sure everything is specified only once (the DRY principle).

  224. Use Code Rocket! by InspirationalThingy · · Score: 1

    This is a problem I have encountered several times in the past, inheriting reasonably large, poorly documented code bases. It can be an interesting personal challenge, deciphering someone else's code, but not when you are working to a timescale.

    I became so frustrated that I decided it was time to try and do something about it.

    As a result, we (myself and a couple of other developers) have developed a new software tool which aims to cut through legacy code, to visualise it in an abstract way, and to allow you to build a picture of what its doing quickly and efficiently.

    In simple terms our tool (named 'Code Rocket') is a detailed design documentation tool - kind of like doxygen, but taking documentation a step further.

    We use it to prevent the code from becoming a legacy nightmare in the first place (by ensuring it is structured and documented to a high standard but with limited overheads for software developers during development) and to reverse engineer the meaning of any existing legacy code to guide us through an understanding of it. There are many other side benefits as it turns out relating to project management, review, communication, but the main thing is that I now feel a little more comfortable when presented with a batch of legacy code to investigate. I also agree with the recommendations of building in unit tests.

    If anyone is interested in checking out our tool, you'll find it at the following web site: http://www.rapidqualitysystems.com/

  225. Re:Large? Try shindex on corpnet by snowgirl · · Score: 1

    "The things like the publics need to be available to the other build machines in the distributed system."

    Ahhh, I missed the distributed part. I've worked on some very large systems that took hours to build but I've never actually come across a distributed build during my 20yrs in the industry. The system I'm looking after at the moment builds win32, x64 and ia64 all on the same box, it uses a single python script with a cvs tag as a paramter to do the lot. The *nix builds are also kicked off from a single makefile/tag but run on seperate boxes for the half a dozen flavours we produce.

    Yeah, Windows Server 2003 takes over 5,000 tasks to get everything done for x86, ia64, and x86-64, just for the English, Japanese and German localizations only.

    This was across, say... 12-ish machines, I believe...

    "Honestly, there was a ton of retarded shit that went on in the code base, and I would have done it entirely different"

    A wise man once told me that source code is like shit, everybody else's stinks. ;)

    Oh, I wholly agree. I look back at my crap from earlier days and I go "holy crap, did I write this?" The main project that I've been working on the most for getting close to 10 years now has been rewritten at least like 3 times... I'm at major version number 3, and no one else even uses it!

    But each time, I refine the techniques, and the models, until now I would say it's pretty nice.

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
  226. That's easy by Anonymous Coward · · Score: 0

    1 - Convince your customer that the platform the project is based on is deprecated, and they need a complete rewrite in a fancy completely new technology, so they can have all the functionality previous programmers denied them, and more. For example, you could migrate from Cobol to Java, from Java to C#, from C# to Go...
    2 - Be sure to do it with fancy shiny graphics, so the customer can see there are advantages to the new stinky pile of crap with half of the previous functionality you have sell to them for a lot of money.
    3 - When they complain about bugs and lack of functionality, sell them a mainteinance contract. They will sign it, or lose all the huge pile of money they have already wasted in the rewrite.
    4 - Profit!

  227. Re:30 to 40 thousand lines isn't large by any meas by Anonymous Coward · · Score: 0

    I am currently working with a mission-critical codebase, which is written in PHP and has absolutely no cohesive design to it. ... There are business rules just everywhere and API requests everywhere and all kinds of calls that overwrite static variables. ... If you inherit something like this, and it is mission critical, then you need to take as long as it takes to get it right. ... Don't remove seemingly unnecessary variables, and don't reduce seemingly redundant database calls.,,

    What a lame and sorry state of affairs. It makes me wonder if you work for Honda or Toyota: "Sorry sir, we knew the pedal wasn't working right, we knew you could die, but we couldn't afford to recognize we screwed it up".

  228. Re:30 to 40 thousand lines isn't large by any meas by hobo+sapiens · · Score: 1

    nope, that sir is life at a startup. You have to deal with crap like that, but as a trade off you get to work in a fairly unregulated environment and with cool new tech. You just gotta quit whining, get to work, and hopefully enjoy the challenge.

    --
    blah blah blah
  229. That's legacy code by Anonymous Coward · · Score: 0

    You need to grab a copy of "Working Effectively with Legacy Code" by Michael Feathers and read it carefully. That will help.

    And yes, 40kloc is not a lot at all (unless these aren't huge perl regular expressions *evil grin*).

  230. Use a tool by Ritchie70 · · Score: 1

    I was unemployed for the first quarter of 2002 and found some by-the-hour contract work maintaining an old Win16 application. Hideous tangle of C code making up a very vertical application.

    I wound up using SciTool's Understand to figure out unwieldy code bases. Honestly I never paid for it, as I said it was a short-term, per-hour contractor job and they weren't paying for tools, so I used the demo until the demo period ran out. (And this wasn't a $200 an hour kind of contract.)

    I've also had Indian consulting firms, as part of their claim that "they can analyze and understand our code base" hand me a report that I'm pretty sure was the output of that product.

    In any case, something like that is a good starting point.

    I guess Visual Studio now has some of that sort of thing built in, but a proper just-for-that tool may suit you better depending on language and style.

    http://www.scitools.com/

    --
    The preferred solution is to not have a problem.