Slashdot Mirror


Organizing Source Code, Regardless of Language?

og_sh0x queries: "I'm looking for a source of information dedicated to organizing source code. I see a lot of books and other resources covering syntax and various syntax-related philosophies, but I can never seem to find a good resource for organizing source code in general. For instance, at what point do you split that massive source file into multiple files? At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function? These are problems that plague many programming languages. Are there such resources that cover these issues?"

28 of 59 comments (clear)

  1. This by psavo · · Score: 4, Insightful

    is called 'Experience' part of your CV.

    I've yet to find a simple way to determine any of those. It's just that feeling when you get while looking at the code 'damn, not again..'.

    --
    fucktard is a tenderhearted description
  2. Read Stroustrup by ObviousGuy · · Score: 3, Informative

    Programming C++'s first couple of chapters discuss this very topic.

    --
    I have been pwned because my /. password was too easy to guess.
  3. Be consistent by heikkile · · Score: 2

    There are no simple answers to these questions. Best you can do is to formulate your own policy, and stick to it. In real life projects there will always be exceptions and special cases, but it helps a lot if all people working on the project at least know of the existence of common guidelines, and preferably understand and agree with the reasoning behind them.

    --

    In Murphy We Turst

  4. Refactoring by fingal · · Score: 5, Informative
    I would strongly recommend that one reads "Refactoring" (Martin Fowler - Addison-Wesley - 0-201-48567-2) (after reading Design Patterns) for the solid techniques that it introduces for clearly defined manipulations that change the shape of code without changing the functionality. However, even with this it doesn't completely resolve the issue of "when" that you raised, as summed up in the introduction to chapter 3 "Bad Smells in Code":-
    By now you have a good idea of how refactoring works. But just because you know how doesn't mean you know when. Deciding when to start refactoring, and when to stop, is just as important to refactoring as knowing how to operate the mechanics of a refactoring.

    Now comes the dilemma. It is easy to explain to you how to delete an instance variable or create a hierarchy. These are simple matters. Trying to explain when you should do these things is not so cut-and-dried. Rather than appealing to some vague notion of programming aesthetics (which frankly is what we consultants usually do), I wanted something a bit more solid.
    I was mulling over this tricky issue when I visited Kent Beck in Zurich. Perhaps he was under the influence of the odors of his newborn daughter at the time, but he had come up with the notion describing the "when" of refactoring in terms of smells. "Smells" you say, " and that is supposed to be better than vague aesthetics?" Well, yes. We look at lots of code, written for projects that span the gamut from wildly successful to nearly dead. In doing so, we have learned to look for certain structures in the code that suggest (sometimes they scream for) the posibility of refactoring.

    One thing we won't try to do here is give you precise criteria for when a refactoring is overdue. In our experience no set of metrics rivals informed human intuition. What we will do is give you indications that ther is trouble that can be solved by a refactoring. You will have to develop your own sense of how many instance variables are too many instance variables and how many lines of code in a method are too many lines.

    The book then goes on to describe the various types of "abstract smells" and what sort of correctional techniques can be considered to correct them, for example:-

    Iappropriate Intimacy
    Sometimes classes become far too intimate and spend too much time delving in each others' private parts. We may not be prudes when it comes to people, but we think our classes should follow strict, puritan rules.

    Overintimate classes need to be broken up as lovers were in ancient days. Use
    Move Method and Move Field to seperate the pieces to reduce the intimacy. See whether you can arrange a Change Bidirectional Association to Unidirectional. If the classes do have common interests, use Extract Class to put the commonality in a safe place and make honest classes of them. Or use Hide Delegate to let another class act as a go-between.

    Inheritance often can lead to overintimacy. Subclasses are always going to know more about their parents than their parents would like them to know. If it's time to leave home, apply
    Replace Inheritance with Delegation.

    I have frequently found that just reading through this short (~15) collection of abstracted "smells" gives a very good way of supplementing the "experience" that you speak of and helping you to make decisions with the benefit of a) a bit of third party support in making these decisions and b) a clearly defined set of rules as to how to apply each of the refactorings including test cases to prove that the functionality has not been changed in the process and, more importantly, a clean roll-back procedure for those times when the olfactory senses get a little bit confused...

    --

    The only Good System is a Sound System

    1. Re:Refactoring by gaj · · Score: 5, Informative
      I'll second that recommendation.

      When I first saw "Refactoring", I said to myself: "Self, now I've seen everything". I thought it was yet another book enshrining process and procedure over good working code.

      I was wrong.

      This book is really good for those who haven't yet learned what Stroustrup refers to as "taste". Hell, I've been coding for many, many years and I certainly thought it was worth a read!

      My only caution is the same as the one I give about GoF, UML, OOA/OOD/OOP, or any other codified programming "methods": Don't blindly follow them w/o taking your own experience into account.

      Basically, the less time you've been coding, the more seriously you should take these concepts. Over time, and with many KLOC, you'll develop your own "taste"; your own sense of what works. This is not to say that learning new methods is useless to someone who's been coding for a long time; far from it. Just that, in most cases, a hacker will develop a pretty good idea of what works for them and what doesn't. The dirty little secret is, whether your like the language he "accreted" or not, Wall is right: TMTOWTDI. And knowing which way to use in any given circumstance can only come with experience. Reading books like "Refactoring" can help a lot until you get that experience, though!

    2. Re:Refactoring by sohp · · Score: 2

      There good news is that there are great tools for automating refactorings for some languages (Java and Smalltalk come to mind immediately). The bad news is that C/C++ is not one of the languages, but C# is.

      That said, some of what refactoring browsers do can be done with search and replace. Be careful though, you don't want to change all occurences of "i" to "index" and end up with code that won't compile because there's no type "indexnt".

      The ability to rollback refactorings is essential, too. Industrial-strength source control tools are pretty much a necessity, allowing you to re-get your CVS tree if a refactoring attempt gets out of hand.

      Since I bought Martin Fowler's book and started studying refactorings, my code has gotten much easier to live with. I no longer fear making a significant change to add functionality or fix bugs because I know I can refactor and still have code that continues to work as before. The addition of unit tests has helped to ensure that it not only keeps working, but it keeps behaving as expected.

  5. Book recommendation by Read+The+Fine+Manual · · Score: 5, Informative
    Have a look at this book by Steve McConnell: Code Complete: A Practical Handbook of Software Construction.

    Yes, McConnell is a Microsoft guy, but this book is completely operating-system and programming-language agnostic (even though examples are in C, Fortran, and Pascal, IIRC). It is an excellent guide to software construction, covering every aspect from design, over coding practice, style issues, to project management. I highly recommend it.

  6. The wrong questions by p3d0 · · Score: 4, Insightful

    These sound like the wrong questions to me. It reminds me of someone's (perhaps Dijkstra's?) story of the response he received when he recommended abolishing gotos. Someone said "ok, I'll buy that; so what do I do if I'm at this point in the program, and I want to get to that point?"

    The trouble with such a question is that it has no answer. Dijkstra's argument was not that one should take existing programs and remove the gotos; rather, that programs written using only structured elements (sequencing, conditionals, loops) are more comprehensible, and don't require any gotos because there is a more elegant way to achieve the same effect. Thus, as you can see, there really is no answer to the question; the questionner's approach was fundamentally flawed.

    Likewise, software organization is not done in terms of functions; rather, it is done in terms of information-hiding modules. To ask when one huge function should be split into to, or when two similar functions should be merged, indicates to me that the design might be flawed. Sometimes that's unavoidable; for instance, if you are involved in a project written by someone else. In that case, you do indeed need to make this kind of decision.

    However, true modular programming does not mean taking huge lumbering hunks of code and splitting them into modules. It means writing modules using the principles of information hiding to avoid making huge lumbering hunks of code in the first place.

    This, of course, is easier said than done. It's not that hard to avoid gotos, because the use of Dijkstra's structured programming techniques makes them unnecessary. In contrast, writing good modules is hard, and without superhuman foresight, some modules are bound to be pretty crummy. These will need to be rewritten in order to achieve good information hiding properties.

    So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide. Modules for which you can't find a succinct statement (12 words or less, with no ifs, ands, or ors) of what information they hide are poorly designed, and need an overhaul. A symptom of this may be that your functions are redundant, or too long, but the core problem is one of poor module design.

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    1. Re:The wrong questions by elflord · · Score: 3, Insightful
      So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide.

      Actually, the questions he is asking are indeed very important. It's all well to say that code "should be well designed", and indeed, most books spend a lot of time talking about design principles for people with clean slates. Unfortunately, very few people have a clean slate to work with. Using a good design up front is not an option if you're not the one who did the upfront design. We are either stuck with maintaning poorly designed code, or even code that was designed well up-front, but needs a change in design to meet changing requirements. What a book like refactoring brings to the table is the process of incremental redesign. Redesigning code without rewriting it is a fine art, and refactoring basically explains how to do it.

    2. Re:The wrong questions by p3d0 · · Score: 2

      You're right, of course, that it is too easy to say "you should have designed it right in the first place", and I tried not to say just that, though I may have failed. :-)

      I tried to give some advice on how to tell whether a module system is good (that is, by information hiding); and further, to answer his question, my advice would be to refactor whenever he sees that information is not being hidden properly by the system's modules.

      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  7. If you have good tools... by splattertrousers · · Score: 2, Interesting
    For instance, at what point do you split that massive source file into multiple files?

    You do it as soon as you notice the problem. If you have good tools, it will be simple and fun (yes, fun).

    A refactoring browser like IDEA from IntelliJ makes it simple. Hilight a few lines of code, choose "Extract Method" from a menu, and the code is extracted into a new method with all the necessary parameters created and passed in and the necessary return type and assignment created. For example:

    1: int a = 12, b = 9;
    2: a += 43 * b + 12 / 4;
    Hilight the expression afther the "+=" online 2 and extract method, calling it "foo":
    1: int a = 12, b = 9;
    2: a += foo( b );

    3: private int foo( int c ) {
    4: return 43 * c + 12 / 4;
    5: }

    At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function?

    It also has a rename feature which will rename a method or variable and change all references to it, but doesn't change references to different variables or methods that happen to have the same name.

    It has lots mroe features, but you can read about them for yourself and download the program and play it.

    There are other refactoring browsers out there too, like the free Eclipse from IBM. With the right tools, you can easily make your code less messy.

  8. This book answers in detail by dant · · Score: 2, Informative
    This question is the subject of Large Scale C++ Software Design by John Lakos.

    Don't let the title fool you--although he uses C++ for his examples, the concepts he talks about (splitting code into components, why each component should be in its own file, levelization of components, etc.) make sense in any OO language.

    I consider this book a must-read for anybody working on large programs.

  9. Re:Always! by sohp · · Score: 3, Insightful

    Having a thousand 10 line files does nothing to improve maintainability.

    My obligatory plug for The Mozilla Project. Not quite one function per source file, but definitely lots of very small source files, each implementing a very narrow slice of functionality. Mozilla is pretty well factored code, and maintainability is enhanced by the separation of responsibilities. It makes it possible to enhance or fix problems in one area, say the in nsFTPChannel, and know that all the thousands of other lines in the program will be largely insulated from those changes.

    Yes, it does take a while to get familiar with the entire Mozilla codebase. The flip side is that you only have to look at and understand a small fraction of it to start becoming productive.

    If you are using C++, Large Scale C++ Software Design is definitely a recommendation I can second.

  10. Re:Le plus ca change... by sohp · · Score: 2

    You refactor when you find you have bad coupling. This is the same criteria that has been used since back in the dark ages of structured, procedural programming.

    It's always good to see the grey-hairs confirming that what seems new and different and untested is in fact obvious and essential for junior programmers to know. Repackaging it a Refactoring may not add anything new, but it does place it in a context that's more accessible to those not raised on FORTRAN and COBOL. Plus, when the old classics are out of print and hard to find, it's good that the new refactorings of the information are still on the shelves at Amazon.

  11. Java? by FortKnox · · Score: 2

    Java forces you to make each file a different object. Then comes organizing all your files into packages. For this, we use patterns (like model-view-controller pattern). The higher level after patterns is application specific.

    Ahh, the joys of OOP...

    --
    Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
  12. Large-Scale C++ Software Design by cpeterso · · Score: 2

    I also like John Lakos' Large-Scale C++ Software Design. Yes, it is quite C++ specific, but this books has a unique focus on the the physical design on your software. Lakos describes how to organize your project files to minimize dependencies, reduce compile-time, and improve developer productivity.

  13. Them er fightin' words by Tablizer · · Score: 2

    (* design module interfaces not according to what services they provide, but what information they hide. *)

    Sounds like a hidden ad for OO thinking.

    oop.ismad.com

    OOP has never been proven to be objectively superior, neither WRT code size, nor reuse, nor less change under change-impact analyses. (Except possibly in a few narrow domains.)

    The trick to procedural is good table schema design IMO. In 70's they didn't know about this when they started bashing procedural designs and promoted OO as a solution.

    1. Re:Them er fightin' words by Tablizer · · Score: 2

      (* Information hiding is orthogonal to OO. *)

      Perhaps a realistic example is in order. Shape, animal, and device driver toy examples don't scale to real things that I actually encounter.

      (* Name a discipline that has been proven in such a way. *)

      One can show that 3rd-generation languages can code the same thing with less code and be more transportable to other platforms than assembler.

      (* Most of your system shouldn't have a clue that there even are tables. *)

      Relational tables are a protocol and organizational philosophy. They allow, for example, one to get GOF-like patterns with mere formulas instead of painstaking hand-referencing needed in OOP.

      (* Plus, this falls flat for systems that are not based on tables. *)

      Well, I consider tables a paradigm. It is true that paradigm X will match better with another interface that is also in paradigm X, and visa versa. However, OO faces the same tradeoff. This is one of the reasons for the "impedence mismatch" between OO and RDBMS's.

    2. Re:Them er fightin' words by Tablizer · · Score: 2

      (* I'll have to refer you to Parnas's original article *)

      I interpret Parnas as pointing toward a need for a standardized way to access collections. IOW, a database interface.

      Besides, it is not very clear exactly what the system is supposed to do, so it is hard estimate future change patterns and frequencies.

      (* If you want a bigger example, there's my Master's thesis work *)

      Speaking of modular, it is tough to figure out exactly what this contraption does. It seems like systems-software, kinda outside my domain of custom biz software.

      Also, students don't really have enough real-world experience to have a feel for how and where requirements change IMO. I probably would have gone along with OO out of school because of its appeal to (over) idealistic change patterns. I wouldn't know any better back then.

      (* Do you have a reference for such a study? *)

      No. But I never met an assembler fan who challenged it. You are not questing the cross-platform claim, are you?

      (* The 3rd-generation-versus-assembly is the most clear-cut case of programming language expressive power there is, and yet it's still quite hard to "prove" in any meaningful way. *)

      I don't think it would take that much. Take a medium-complexity problem and challenge an assembler fan to do it with less code. Then toss them some typical change scenarios and see who's code is affected the most. (They can counter with their own scenarios, BTW.)

      Besides, if I am wrong, perhaps there are assembler fans who can out-program and out-maintain C,Python,LISP, etc. programmers.

      That would suggest that paradigms are subjective. People favor the paradigm that best maps to the way that they think.

      I don't think this is really the case with assembler, but is with other paradigms.

      (* That's interesting. Do you have any references for this? *)

      I didn't apply any metrics, but examples of GOF and GOF-like patterns using tables can be found at:

      http://www.geocities.com/tablizer/prpats.htm

    3. Re:Them er fightin' words by Tablizer · · Score: 2

      (* It is my feeling that skillfully-applied OO wins over some other paradigms (equally skillfully-applied) at the high end of complexity *)

      This is often, but not always, stated by OO fans. If this is the case, then how come it is being touted for everything (all sizes), and pushing alternatives and research in alternatives away?

      IMO, the procedural/relational approach scales well because you consider mostly *one task* at a time, and communicate mostly through the database.

      Detractractors will say that relying on tables like this causes ripple effects if the schema needs to change. I would point out that this is very similar to the affect of an *interface* changing in an OO app. Tables *are* an interface.

      (Hiding changes via database views and triggers veries per vendor. The products could probably improve here, but there is no in-born limit of the paradigm which prevents them.)

      (* I have never seen relational programming advanced as a general-purpose paragigm for software construction, so I'll find it interesting to investigate. *)

      I don't know if it is general purpose, it just seems to work well for custom biz apps. One-size-fits-all is probably not the case.

      Regarding Design-by-Contract, it is hard to implement such for many types of business rules. It takes more code to state the contract than it does to implement it in many cases. You end up have to change 2 things instead of one when new requirements come: the implementation *and* the contract verification code. Thus, you increase the chance of errors. It often violates the once-and-only-once rule of factoring.

      The stack DBC examples in the books don't seem to extrapolate to real-world requirements very smoothly. (I stopped using stacks when decent databases came along. A "stack" is simply one of many possible views of any collection. IOW, "Has-a" stack view instead of "is-a" stack.) Good abstraction is all about managing relativism IMO.

    4. Re:Them er fightin' words by Tablizer · · Score: 2

      (* In a system designed using DbC, the contracts are far simpler than the implementation. *)

      Maybe in scientific computing where the interface is simple, but the computations are complex. However, biz apps tend to be the other way around. (Biz apps tend to be complex in the way that multiple things interact and the biz rules can reference.)

      (* Preconditions and postconditions are nothing more than a precise way to specify what something does. *)

      Try comments. Well-worded comments are not going to beat the usefulness of some machine-readable notation precisely because it is tuned for the machine instead for people.

      (* The same argument regarding redundancy could be used against type annotations.... *)

      I can live without those. I tend more toward scriptish langs anyhow these days.

    5. Re:Them er fightin' words by Tablizer · · Score: 2

      Besides,

      Validation checks can be made with simple IF statements.

      If not inRange(...) then
      panic_or_something
      end if

    6. Re:Them er fightin' words by Tablizer · · Score: 2

      (* In contrast, once certain bugs are rare enough, assertion checks can be disabled, and no longer add any performance overhead. *)

      So it is slow the first 2 years, before More's law makes it not matter? That is not a very good selling point.

      (* Furthermore, even if you never disable assertion checks, DbC makes it clear exactly where they are necessary, so you don't end up with duplicate redundant checks. *)

      And IF statements are not because they are not weird and funky enough to stand out? That is a silly argument. Besides, you can call the same function each time:

      if Not inRange(...)
      SameFamiliarName("Foo out of range")
      end if

      (* DbC is not an implementation technique to check for errors; it's a design methodology to delineate precisely the responsibilities of each class/module/function in a system. *)

      Yeah yeah. I have had this argument before, and how DBC is so *subletly* different that it does not really matter.

      Use what is already available and stop adding goofy little syntax to a language to make it funkier and funkier. Reinvent something really different, not a glorified IF statement. That is a waste of complexity.

    7. Re:Them er fightin' words by Tablizer · · Score: 2

      (* Sure, you can grep for "if", but you need to know the difference between error checks that trap bugs in the program, versus those that catch valid error conditions like user errors. *)

      I already described how to do that.

      Another way is with a comment. The advantage of a comment is that you can create more complex "removal schemes". For example, you may not want to remove *all* the checks, but just the most costly ones (CPU-wise).

      if Not inRange(....) // DBC: level_3
      DBCraise("x is out of range")
      end if ....
      if Not inRange(....) // DBC: level_2
      DBCraise("y is out of range")
      end if

      If you rely on built-in stuff, then you cannot add features like that if you want to: you are stuck with whatever is out-of-the-box. In this case, all-or-nothing removal/disable of the checks.

      (* You dismiss my statement that Design by Contract is more than just an IF statement, and then you claim that because it's just an IF statement, it's worthless. *)

      I did not say "worthless". I am saying that you have not justified dedicated syntax.

      DBC is just a round-about, consultant buzzword wallet-draining way of saying:

      "Testing assumptions is a good"

  14. Database normalisation rules. by oliverthered · · Score: 3, Interesting

    Databases and code should be designed in a similar way, for more or less the same reasons. If all the refactoring book people have been recommending seem a bit extreme (even the word refactoring sounds extreme to me, a bit like downsizing grrrr....).

    Try getting a simple DB design book that goes through a normalisation process, it should make for a lighter read.

    Then think about how to apply the process to software(a bit of light thinking)

    The first couple of steps are something like

    separate everything out into discrete chunks

    look at 'keys' and 'indexes' (in source code they are design patterns, data structures the things that tie the chicks together).

    You don't need a 1000 page bible, you need a ten pages of guide lines and good practices and a bit of brain power.

    --
    thank God the internet isn't a human right.
    1. Re:Database normalisation rules. by alienmole · · Score: 2
      data structures the things that tie the chicks [sic] together

      When you find yourself subconsciously writing about "tying chicks together" in a discussion of source code organization, it's time to take a break from the keyboard and go get laid, if you can...

      I agree with you about the correlation between database normalization and code factoring [which is the correct and long-established term, no matter how much you might dislike the term "refactoring"]. However, to get a database into Nth normal form can be done by following some fairly simple rules. Code isn't quite so easy. Books like Fowler's refactoring book cover details, subtleties, and rationales that even above-average developers may miss.

      Also, refactoring is a name for something that programmers have always done anyway. An agreed-on name is better than no name at all, or many non-standard names.

  15. Re:Le plus ca change... by alienmole · · Score: 2
    But hey, guess what OOP is? It's a way of organizing your procedural code, with some assistance from the language/compiler to help enforce access policies.

    Although that characterization does describe a valid benefit of OOP, it completely misses possibly the most important aspect of OOP, which is the introduction of type-based polymorphism.

    In fact, the "organizing procedural code" benefit of OOP is simply a side effect of designing systems based on interacting types, something which procedural systems didn't directly support. Saying that OOP is a way of organizing your procedural code completely misses the point.

    Modern texts on refactoring focus on factoring issues in these systems of interacting types, and as such are revelant to current systems in a way that it's difficult for e.g. Plauger to be. Certainly, normalizing/factoring/compressing systems has been and always will be a basic goal of software development, but just because the concept is old doesn't mean that there aren't new insights into it. Suggesting otherwise is a little like saying that using Jupiter's gravity to give a spaceprobe an energy boost is nothing new, since Newton discovered gravity. I suspect NASA scientists get most of their information somewhere other than Principia Mathematica.

  16. Other recommendations... by PinglePongle · · Score: 2, Informative

    Programming Pearls by Jon Bentley - old as the hills by now (he talks about the location of data on tape....), but full of very good insights into writing "good code"TM.

    You might also like "The pragmatic programmer" - Hunt and Thomas - which is another "meta-programming" book with a lot of ideas and insights you could actually sell to your pointy-headed boss.

    The section on "zero-tolerance" coding is a great "why and when to refactor" argument. There's also a good section on how to design the units of which your software is composed, how to reduce the coupling between those units, and how to test em when (you think...) they're done.

    Nev

    --
    It's all very well in practice, but it will never work in theory.