Slashdot Mirror


Organizing Source Code, Regardless of Language?

og_sh0x queries: "I'm looking for a source of information dedicated to organizing source code. I see a lot of books and other resources covering syntax and various syntax-related philosophies, but I can never seem to find a good resource for organizing source code in general. For instance, at what point do you split that massive source file into multiple files? At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function? These are problems that plague many programming languages. Are there such resources that cover these issues?"

59 comments

  1. This by psavo · · Score: 4, Insightful

    is called 'Experience' part of your CV.

    I've yet to find a simple way to determine any of those. It's just that feeling when you get while looking at the code 'damn, not again..'.

    --
    fucktard is a tenderhearted description
  2. Always! by 'The+'.$L3mm1ng · · Score: 1
    For instance, at what point do you split that massive source file into multiple files?
    Right from the start, I'd say. Each function/class/whatever should have their own file.
    At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function?
    As early as possible, when as few as possible other functions or parts in the code already use it.
    1. Re:Always! by Anonymous Coward · · Score: 0

      Each function should have its own file? That would quickly become ridiculous.

    2. Re:Always! by 'The+'.$L3mm1ng · · Score: 1

      Not if these functions can be used standalone. Then you can simply #include them in another project instead of #including all the other (probably useless) functions in the file as well and instead of having to copy it. If you copy it, you might want to synch changes afterwards which would be painful if you had to edit the file instead of just overwriting it.

      If the functions belong to a library, you can still have a library file that includes all functions, so that the projects that use this library only have to include one single file.

      Each function in an own file could also ease version management if you are not the only one working on the project. At least if you do not use cvs for whatever reason.

      Does it still sound ridiculous? Is there any real argument AGAINST using separate files for each function, class, etc.? Then please tell.

    3. Re:Always! by Anonymous Coward · · Score: 0

      Does it still sound ridiculous? Is there any real argument AGAINST using separate files for each function, class, etc.? Then please tell.

      Well, yes it does still sound ridiculous. Not even close to all code will/should be reused.

      I'd agree about putting classes in their own files, but putting functions in their own file makes no sense. Where do you draw the line? Do member functions get their own files? What about friendly classes? What about constants?

      It's completely absurd so say everything should get it's own file. Seperate components should have seperate files, if there is a chance of them being used seperately. Having a thousand 10 line files does nothing to improve maintainability.

    4. Re:Always! by sohp · · Score: 3, Insightful

      Having a thousand 10 line files does nothing to improve maintainability.

      My obligatory plug for The Mozilla Project. Not quite one function per source file, but definitely lots of very small source files, each implementing a very narrow slice of functionality. Mozilla is pretty well factored code, and maintainability is enhanced by the separation of responsibilities. It makes it possible to enhance or fix problems in one area, say the in nsFTPChannel, and know that all the thousands of other lines in the program will be largely insulated from those changes.

      Yes, it does take a while to get familiar with the entire Mozilla codebase. The flip side is that you only have to look at and understand a small fraction of it to start becoming productive.

      If you are using C++, Large Scale C++ Software Design is definitely a recommendation I can second.

    5. Re:Always! by Anonymous Coward · · Score: 0

      I take it you don't do work on things like nsDocShell? Yes, definitely "not quite one function per source file".

    6. Re:Always! by sohp · · Score: 1

      In my defense, I only said Mozilla is pretty well factored code. There are around 4000 .cpp files in the 1.0 source, though, and nsDocSHell isn't even in the top 10 in size! Take a look at nsCSSFrameConstructor.cpp, for example -- definitely a candidate for serious reorganization.

  3. Read Stroustrup by ObviousGuy · · Score: 3, Informative

    Programming C++'s first couple of chapters discuss this very topic.

    --
    I have been pwned because my /. password was too easy to guess.
  4. refactoring is what you're after by Anonymous Coward · · Score: 1, Informative

    www.refactoring.com - or any other good refactoring books should help loads to get you started. but there's nothing like experience :)

  5. Be consistent by heikkile · · Score: 2

    There are no simple answers to these questions. Best you can do is to formulate your own policy, and stick to it. In real life projects there will always be exceptions and special cases, but it helps a lot if all people working on the project at least know of the existence of common guidelines, and preferably understand and agree with the reasoning behind them.

    --

    In Murphy We Turst

  6. Refactoring by fingal · · Score: 5, Informative
    I would strongly recommend that one reads "Refactoring" (Martin Fowler - Addison-Wesley - 0-201-48567-2) (after reading Design Patterns) for the solid techniques that it introduces for clearly defined manipulations that change the shape of code without changing the functionality. However, even with this it doesn't completely resolve the issue of "when" that you raised, as summed up in the introduction to chapter 3 "Bad Smells in Code":-
    By now you have a good idea of how refactoring works. But just because you know how doesn't mean you know when. Deciding when to start refactoring, and when to stop, is just as important to refactoring as knowing how to operate the mechanics of a refactoring.

    Now comes the dilemma. It is easy to explain to you how to delete an instance variable or create a hierarchy. These are simple matters. Trying to explain when you should do these things is not so cut-and-dried. Rather than appealing to some vague notion of programming aesthetics (which frankly is what we consultants usually do), I wanted something a bit more solid.
    I was mulling over this tricky issue when I visited Kent Beck in Zurich. Perhaps he was under the influence of the odors of his newborn daughter at the time, but he had come up with the notion describing the "when" of refactoring in terms of smells. "Smells" you say, " and that is supposed to be better than vague aesthetics?" Well, yes. We look at lots of code, written for projects that span the gamut from wildly successful to nearly dead. In doing so, we have learned to look for certain structures in the code that suggest (sometimes they scream for) the posibility of refactoring.

    One thing we won't try to do here is give you precise criteria for when a refactoring is overdue. In our experience no set of metrics rivals informed human intuition. What we will do is give you indications that ther is trouble that can be solved by a refactoring. You will have to develop your own sense of how many instance variables are too many instance variables and how many lines of code in a method are too many lines.

    The book then goes on to describe the various types of "abstract smells" and what sort of correctional techniques can be considered to correct them, for example:-

    Iappropriate Intimacy
    Sometimes classes become far too intimate and spend too much time delving in each others' private parts. We may not be prudes when it comes to people, but we think our classes should follow strict, puritan rules.

    Overintimate classes need to be broken up as lovers were in ancient days. Use
    Move Method and Move Field to seperate the pieces to reduce the intimacy. See whether you can arrange a Change Bidirectional Association to Unidirectional. If the classes do have common interests, use Extract Class to put the commonality in a safe place and make honest classes of them. Or use Hide Delegate to let another class act as a go-between.

    Inheritance often can lead to overintimacy. Subclasses are always going to know more about their parents than their parents would like them to know. If it's time to leave home, apply
    Replace Inheritance with Delegation.

    I have frequently found that just reading through this short (~15) collection of abstracted "smells" gives a very good way of supplementing the "experience" that you speak of and helping you to make decisions with the benefit of a) a bit of third party support in making these decisions and b) a clearly defined set of rules as to how to apply each of the refactorings including test cases to prove that the functionality has not been changed in the process and, more importantly, a clean roll-back procedure for those times when the olfactory senses get a little bit confused...

    --

    The only Good System is a Sound System

    1. Re:Refactoring by gaj · · Score: 5, Informative
      I'll second that recommendation.

      When I first saw "Refactoring", I said to myself: "Self, now I've seen everything". I thought it was yet another book enshrining process and procedure over good working code.

      I was wrong.

      This book is really good for those who haven't yet learned what Stroustrup refers to as "taste". Hell, I've been coding for many, many years and I certainly thought it was worth a read!

      My only caution is the same as the one I give about GoF, UML, OOA/OOD/OOP, or any other codified programming "methods": Don't blindly follow them w/o taking your own experience into account.

      Basically, the less time you've been coding, the more seriously you should take these concepts. Over time, and with many KLOC, you'll develop your own "taste"; your own sense of what works. This is not to say that learning new methods is useless to someone who's been coding for a long time; far from it. Just that, in most cases, a hacker will develop a pretty good idea of what works for them and what doesn't. The dirty little secret is, whether your like the language he "accreted" or not, Wall is right: TMTOWTDI. And knowing which way to use in any given circumstance can only come with experience. Reading books like "Refactoring" can help a lot until you get that experience, though!

    2. Re:Refactoring by sohp · · Score: 2

      There good news is that there are great tools for automating refactorings for some languages (Java and Smalltalk come to mind immediately). The bad news is that C/C++ is not one of the languages, but C# is.

      That said, some of what refactoring browsers do can be done with search and replace. Be careful though, you don't want to change all occurences of "i" to "index" and end up with code that won't compile because there's no type "indexnt".

      The ability to rollback refactorings is essential, too. Industrial-strength source control tools are pretty much a necessity, allowing you to re-get your CVS tree if a refactoring attempt gets out of hand.

      Since I bought Martin Fowler's book and started studying refactorings, my code has gotten much easier to live with. I no longer fear making a significant change to add functionality or fix bugs because I know I can refactor and still have code that continues to work as before. The addition of unit tests has helped to ensure that it not only keeps working, but it keeps behaving as expected.

    3. Re:Refactoring by Mad+Marlin · · Score: 1
      That said, some of what refactoring browsers do can be done with search and replace. Be careful though, you don't want to change all occurences of "i" to "index" and end up with code that won't compile because there's no type "indexnt".

      This is why one should seperate every independent token. The vi command :%s/i/index/g may break a lot of things, but :%s/ i / index /g will not.

    4. Re:Refactoring by bcaulf · · Score: 1
      Make that:
      :%s/\<i\>/index/gc
      That's escaped angle brackets to make i match only when it's not part of another word, and /c to require a confirm of each substitution. Confirm might not be necessary but then again it might.
  7. Book recommendation by Read+The+Fine+Manual · · Score: 5, Informative
    Have a look at this book by Steve McConnell: Code Complete: A Practical Handbook of Software Construction.

    Yes, McConnell is a Microsoft guy, but this book is completely operating-system and programming-language agnostic (even though examples are in C, Fortran, and Pascal, IIRC). It is an excellent guide to software construction, covering every aspect from design, over coding practice, style issues, to project management. I highly recommend it.

    1. Re:Book recommendation by Sobhan · · Score: 1

      Yes, I agree that Code Complete is the book which you require to analyse those aspects. It is very good book.

  8. The wrong questions by p3d0 · · Score: 4, Insightful

    These sound like the wrong questions to me. It reminds me of someone's (perhaps Dijkstra's?) story of the response he received when he recommended abolishing gotos. Someone said "ok, I'll buy that; so what do I do if I'm at this point in the program, and I want to get to that point?"

    The trouble with such a question is that it has no answer. Dijkstra's argument was not that one should take existing programs and remove the gotos; rather, that programs written using only structured elements (sequencing, conditionals, loops) are more comprehensible, and don't require any gotos because there is a more elegant way to achieve the same effect. Thus, as you can see, there really is no answer to the question; the questionner's approach was fundamentally flawed.

    Likewise, software organization is not done in terms of functions; rather, it is done in terms of information-hiding modules. To ask when one huge function should be split into to, or when two similar functions should be merged, indicates to me that the design might be flawed. Sometimes that's unavoidable; for instance, if you are involved in a project written by someone else. In that case, you do indeed need to make this kind of decision.

    However, true modular programming does not mean taking huge lumbering hunks of code and splitting them into modules. It means writing modules using the principles of information hiding to avoid making huge lumbering hunks of code in the first place.

    This, of course, is easier said than done. It's not that hard to avoid gotos, because the use of Dijkstra's structured programming techniques makes them unnecessary. In contrast, writing good modules is hard, and without superhuman foresight, some modules are bound to be pretty crummy. These will need to be rewritten in order to achieve good information hiding properties.

    So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide. Modules for which you can't find a succinct statement (12 words or less, with no ifs, ands, or ors) of what information they hide are poorly designed, and need an overhaul. A symptom of this may be that your functions are redundant, or too long, but the core problem is one of poor module design.

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    1. Re:The wrong questions by elflord · · Score: 3, Insightful
      So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide.

      Actually, the questions he is asking are indeed very important. It's all well to say that code "should be well designed", and indeed, most books spend a lot of time talking about design principles for people with clean slates. Unfortunately, very few people have a clean slate to work with. Using a good design up front is not an option if you're not the one who did the upfront design. We are either stuck with maintaning poorly designed code, or even code that was designed well up-front, but needs a change in design to meet changing requirements. What a book like refactoring brings to the table is the process of incremental redesign. Redesigning code without rewriting it is a fine art, and refactoring basically explains how to do it.

    2. Re:The wrong questions by p3d0 · · Score: 2

      You're right, of course, that it is too easy to say "you should have designed it right in the first place", and I tried not to say just that, though I may have failed. :-)

      I tried to give some advice on how to tell whether a module system is good (that is, by information hiding); and further, to answer his question, my advice would be to refactor whenever he sees that information is not being hidden properly by the system's modules.

      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    3. Re:The wrong questions by King+of+the+World · · Score: 1
      To ask when one huge function should be split into to, or when two similar functions should be merged, indicates to me that the design might be flawed. Sometimes that's unavoidable; for instance, if you are involved in a project written by someone else.
      Dude, that's the funniest quote I've read all year. Thanks.
  9. If you have good tools... by splattertrousers · · Score: 2, Interesting
    For instance, at what point do you split that massive source file into multiple files?

    You do it as soon as you notice the problem. If you have good tools, it will be simple and fun (yes, fun).

    A refactoring browser like IDEA from IntelliJ makes it simple. Hilight a few lines of code, choose "Extract Method" from a menu, and the code is extracted into a new method with all the necessary parameters created and passed in and the necessary return type and assignment created. For example:

    1: int a = 12, b = 9;
    2: a += 43 * b + 12 / 4;
    Hilight the expression afther the "+=" online 2 and extract method, calling it "foo":
    1: int a = 12, b = 9;
    2: a += foo( b );

    3: private int foo( int c ) {
    4: return 43 * c + 12 / 4;
    5: }

    At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function?

    It also has a rename feature which will rename a method or variable and change all references to it, but doesn't change references to different variables or methods that happen to have the same name.

    It has lots mroe features, but you can read about them for yourself and download the program and play it.

    There are other refactoring browsers out there too, like the free Eclipse from IBM. With the right tools, you can easily make your code less messy.

  10. This book answers in detail by dant · · Score: 2, Informative
    This question is the subject of Large Scale C++ Software Design by John Lakos.

    Don't let the title fool you--although he uses C++ for his examples, the concepts he talks about (splitting code into components, why each component should be in its own file, levelization of components, etc.) make sense in any OO language.

    I consider this book a must-read for anybody working on large programs.

  11. Le plus ca change... by Anonymous Coward · · Score: 0
    Into the wayback machine!


    You refactor when you find you have bad coupling. This is the same criteria that has been used since back in the dark ages of structured, procedural programming. But hey, guess what OOP is? It's a way of organizing your procedural code, with some assistance from the language/compiler to help enforce access policies.


    This Refactoring book everyone is mentioning is a good one, but I suspect that its authors understand the old-fashioned concept of coupling so deeply that they don't realize that it's a mystery to many readers. That's old-fashioned in the sense that the law of gravity is old-fashioned, or hemoglobin is old-fashioned, or DNA is. Being old-fashioned is a prerequisite of being an essential underpinning of everything.


    I would like to recommend an article by Stevens, Myers, and Constantine with the unassuming title of Structured Design - sounds almost Victorian in a quaint, old-fashioned way, doesn't it? But it's almost certain that none of you youngsters will have that 1974 issue of IBM Systems Journal on your shelves, and last time I checked Yourdon's Classics in Software Engineering was long since out of print. Perhaps you can find some of the books that came in later years as the authors developed their ideas. But the core is all here, in this early work.


    Another fine title for avoiding the programmerly version of Santayana's Curse, which I fear may be little easier to find than Classics, is P. J. Plauger's Programming on Purpose. It touches on these same issues as well as many other design techniques which, I fear, all too many of you do find yourselves doomed to reinvent - badly, as the wag said about those who have reinvented Unix over the years.


    --

    Mama, don't let your children grow up to be slashdotties

    1. Re:Le plus ca change... by sohp · · Score: 2

      You refactor when you find you have bad coupling. This is the same criteria that has been used since back in the dark ages of structured, procedural programming.

      It's always good to see the grey-hairs confirming that what seems new and different and untested is in fact obvious and essential for junior programmers to know. Repackaging it a Refactoring may not add anything new, but it does place it in a context that's more accessible to those not raised on FORTRAN and COBOL. Plus, when the old classics are out of print and hard to find, it's good that the new refactorings of the information are still on the shelves at Amazon.

    2. Re:Le plus ca change... by alienmole · · Score: 2
      But hey, guess what OOP is? It's a way of organizing your procedural code, with some assistance from the language/compiler to help enforce access policies.

      Although that characterization does describe a valid benefit of OOP, it completely misses possibly the most important aspect of OOP, which is the introduction of type-based polymorphism.

      In fact, the "organizing procedural code" benefit of OOP is simply a side effect of designing systems based on interacting types, something which procedural systems didn't directly support. Saying that OOP is a way of organizing your procedural code completely misses the point.

      Modern texts on refactoring focus on factoring issues in these systems of interacting types, and as such are revelant to current systems in a way that it's difficult for e.g. Plauger to be. Certainly, normalizing/factoring/compressing systems has been and always will be a basic goal of software development, but just because the concept is old doesn't mean that there aren't new insights into it. Suggesting otherwise is a little like saying that using Jupiter's gravity to give a spaceprobe an energy boost is nothing new, since Newton discovered gravity. I suspect NASA scientists get most of their information somewhere other than Principia Mathematica.

    3. Re:Le plus ca change... by bcaulf · · Score: 1

      I would like to recommend an article by Stevens, Myers, and Constantine with the unassuming title of Structured Design... Perhaps you can find some of the books that came in later years...

      Yourdon & Constantine's 1975 book Structured Design , originally published by Prentice-Hall, is still in print in a photocopy/perfect-bound edition from Yourdon Press. The covers suck but the text is reproduced perfectly, except for the halftone boxes at the heads of the chapters.

  12. Congratulations by MrBoring · · Score: 1

    Before recommendations are made, I'd say congratulations that you're thinking of this. I'd further congratulate you if you work for someone who'll let you make such improvements. In my company, there's always some "business case", policies, politics, and other reasons for not making improvements.

    Nevertheless, I'd recommend a good book on structured analysis and design, then go to a design patterns book, perhaps. The SA&D is probably more micro and the patterns is probably more macro in nature today. I'd do this if you needed "justification" for what you were doing.

    Although not directly related, learning basic database concepts wouldn't be bad either. I think normalization concepts might effect your thinking in a positive way. Not only for organizing data structures, but also where you place files in your build structure. Maybe it'll help you avoid needless redundancy. At least these are based on more sound mathematics, and the knowledge thereof will last.

    I think the dirty little secret towards these books is that much of it is written opinion, with little scientific evidence for its reasoning. In computers today, much credibility comes more from writing about something than proving something. Not entirely, perhaps, but if it were really gospel, then we wouldn't have displaced SA&D with OOA&D and patterns. Develop your own style, be consistent, and stick to your guns, because your opinion is probably no worse than the others.

  13. Java? by FortKnox · · Score: 2

    Java forces you to make each file a different object. Then comes organizing all your files into packages. For this, we use patterns (like model-view-controller pattern). The higher level after patterns is application specific.

    Ahh, the joys of OOP...

    --
    Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
    1. Re:Java? by davidmccabe · · Score: 1

      Um...what *are* you talking about?

      In Java, every *class* needs to be in a separate file, except inner classes, which are only visible to the public class (the one with the name of the file).

      Classes are organized into packages, and should be put into a directory tree that matches the package hierarchy.

      What with has to do with MVC I'd rather not guess.

    2. Re:Java? by FortKnox · · Score: 1

      bah, shoulda said "each object into a different file". Its been one of "those" kinda days... sorry.

      --
      Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
    3. Re:Java? by davidmccabe · · Score: 1

      This is getting rather off-topic, but every class goes in a different file. You never code objects at all, only classes. Then, some class magically gets its public static void main(String[] args) method called, from which you can instantiate objects based the classes that you have written.

    4. Re:Java? by Anonymous Coward · · Score: 0

      Now you are just getting nitpicky. Isn't a class just a definition of an object? I think my point came across fine, you're just being nitpicky.

    5. Re:Java? by user2048 · · Score: 1

      Every public class goes in a different file, IIRC. That file can also define other, non-public classes, both inner and non-inner.

    6. Re:Java? by davidmccabe · · Score: 1

      I'm not trying to be overly particular, its just that I like everything to be correct :-). I'm sorry if I've offended you in way.

      Yes, a class is a blueprint for a type of object.

      The thing about classes and objects is, they are not the same in that they don't have a one-to-one relationship all of the time, just as "this kind of foo" is not the same as "this particular foo".

    7. Re:Java? by bcaulf · · Score: 1

      Java most certainly does not force you to make each file a different object, inasmuch as objects are created at runtime and do not exist in Java source files. Nor does Java force you to define each top level class in a different file, as davidmccabe stated. user2048 was correct in stating that "Every public class goes in a different file".

      The complete rule is that only one public top level class or interface can be defined per file, and the name of the file must be the same as the name of the public class or interface (plus .java). Additional non-public classes or interfaces can be defined in that same file, or indeed in some other file with an arbitrary .java name. For completeness, it is not necessary to define any classes or interfaces at all in a java source file, although a file with no class or interface definitions will be useless. And, yes, none of this file organization has much of anything to do with patterns.

      Anyone want to hire a Java language lawyer?

  14. Large-Scale C++ Software Design by cpeterso · · Score: 2

    I also like John Lakos' Large-Scale C++ Software Design. Yes, it is quite C++ specific, but this books has a unique focus on the the physical design on your software. Lakos describes how to organize your project files to minimize dependencies, reduce compile-time, and improve developer productivity.

  15. Them er fightin' words by Tablizer · · Score: 2

    (* design module interfaces not according to what services they provide, but what information they hide. *)

    Sounds like a hidden ad for OO thinking.

    oop.ismad.com

    OOP has never been proven to be objectively superior, neither WRT code size, nor reuse, nor less change under change-impact analyses. (Except possibly in a few narrow domains.)

    The trick to procedural is good table schema design IMO. In 70's they didn't know about this when they started bashing procedural designs and promoted OO as a solution.

    1. Re:Them er fightin' words by p3d0 · · Score: 1
      Sounds like a hidden ad for OO thinking.
      Information hiding is orthogonal to OO.
      OOP has never been proven to be objectively superior, neither WRT code size, nor reuse, nor less change under change-impact analyses.
      Name a discipline that has been proven in such a way.
      The trick to procedural is good table schema design IMO.
      No offence intended, but IMHO that's retarded. Most of your system shouldn't have a clue that there even are tables. Plus, this falls flat for systems that are not based on tables. If you don't mind my saying so, it sounds like you have written software in a fairly narrow application domain.
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    2. Re:Them er fightin' words by Tablizer · · Score: 2

      (* Information hiding is orthogonal to OO. *)

      Perhaps a realistic example is in order. Shape, animal, and device driver toy examples don't scale to real things that I actually encounter.

      (* Name a discipline that has been proven in such a way. *)

      One can show that 3rd-generation languages can code the same thing with less code and be more transportable to other platforms than assembler.

      (* Most of your system shouldn't have a clue that there even are tables. *)

      Relational tables are a protocol and organizational philosophy. They allow, for example, one to get GOF-like patterns with mere formulas instead of painstaking hand-referencing needed in OOP.

      (* Plus, this falls flat for systems that are not based on tables. *)

      Well, I consider tables a paradigm. It is true that paradigm X will match better with another interface that is also in paradigm X, and visa versa. However, OO faces the same tradeoff. This is one of the reasons for the "impedence mismatch" between OO and RDBMS's.

    3. Re:Them er fightin' words by p3d0 · · Score: 1
      This is getting interesting.
      Perhaps a realistic example is in order. Shape, animal, and device driver toy examples don't scale to real things that I actually encounter.
      Those are toy examples of OO, so of course I won't use those to demonstrate how information hiding is orthogonal to OO. :-)

      A nontrivial example won't fit here, so I'll have to refer you to Parnas's original article on the topic. Its example could still be considered trivial by today's standards, but it's far better than anything I could fit in this space. It makes no reference to OO whatsoever. In fact, it's decidedly non-OO, with modules like "circular shifter" and "alphabetizer" that are most certainly procedural abstractions.

      If you want a bigger example, there's my Master's thesis work, especially my defence presentation (PowerPoint slides). I consider it a good example of a successful application of information hiding principles, and it's about 23,000 LOC, so it's big enough to be considered nontrivial. It's also OO, so it doesn't prove that OO is orthogonal to information hiding, but I feel that its success arises from information hiding more than OO (especially since it's written in C and so makes no use of inheritance).

      One can show that 3rd-generation languages can code the same thing with less code and be more transportable to other platforms than assembler.
      How does one show that? Do you have a reference for such a study?

      The 3rd-generation-versus-assembly is the most clear-cut case of programming language expressive power there is, and yet it's still quite hard to "prove" in any meaningful way.

      Relational tables are a protocol and organizational philosophy. They allow, for example, one to get GOF-like patterns with mere formulas instead of painstaking hand-referencing needed in OOP.
      That's interesting. Do you have any references for this?
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    4. Re:Them er fightin' words by Tablizer · · Score: 2

      (* I'll have to refer you to Parnas's original article *)

      I interpret Parnas as pointing toward a need for a standardized way to access collections. IOW, a database interface.

      Besides, it is not very clear exactly what the system is supposed to do, so it is hard estimate future change patterns and frequencies.

      (* If you want a bigger example, there's my Master's thesis work *)

      Speaking of modular, it is tough to figure out exactly what this contraption does. It seems like systems-software, kinda outside my domain of custom biz software.

      Also, students don't really have enough real-world experience to have a feel for how and where requirements change IMO. I probably would have gone along with OO out of school because of its appeal to (over) idealistic change patterns. I wouldn't know any better back then.

      (* Do you have a reference for such a study? *)

      No. But I never met an assembler fan who challenged it. You are not questing the cross-platform claim, are you?

      (* The 3rd-generation-versus-assembly is the most clear-cut case of programming language expressive power there is, and yet it's still quite hard to "prove" in any meaningful way. *)

      I don't think it would take that much. Take a medium-complexity problem and challenge an assembler fan to do it with less code. Then toss them some typical change scenarios and see who's code is affected the most. (They can counter with their own scenarios, BTW.)

      Besides, if I am wrong, perhaps there are assembler fans who can out-program and out-maintain C,Python,LISP, etc. programmers.

      That would suggest that paradigms are subjective. People favor the paradigm that best maps to the way that they think.

      I don't think this is really the case with assembler, but is with other paradigms.

      (* That's interesting. Do you have any references for this? *)

      I didn't apply any metrics, but examples of GOF and GOF-like patterns using tables can be found at:

      http://www.geocities.com/tablizer/prpats.htm

    5. Re:Them er fightin' words by p3d0 · · Score: 1
      Thanks for the reference. I have never seen relational programming advanced as a general-purpose paragigm for software construction, so I'll find it interesting to investigate.

      My personal opinion regarding OO is that people are disappointed in it for a number of reasons:
      • It has been oversold as a panacea, so people become disappointed when they discover that they still need to think.
      • It has been represented very poorly by at least one language, C++, which has convinced many that OO is unworkable on large, complex projects.
      • Popular OO languages and approaches miss out on Design by Contract, making them far less effective.
      • The majority of programmers are simply not skilled enough to architect large enough projects to evaluate a paradigm's scalability. (This same assertion, in a different form, is what led Fred Brooks to promote the surgeon team in The Mythical Man Month twenty years ago.) It is my feeling that skillfully-applied OO wins over some other paradigms (equally skillfully-applied) at the high end of complexity, though I could just be another of those unskilled programmers relying on blind faith in OO. :-)
      Thanks again for the discussion.
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    6. Re:Them er fightin' words by Tablizer · · Score: 2

      (* It is my feeling that skillfully-applied OO wins over some other paradigms (equally skillfully-applied) at the high end of complexity *)

      This is often, but not always, stated by OO fans. If this is the case, then how come it is being touted for everything (all sizes), and pushing alternatives and research in alternatives away?

      IMO, the procedural/relational approach scales well because you consider mostly *one task* at a time, and communicate mostly through the database.

      Detractractors will say that relying on tables like this causes ripple effects if the schema needs to change. I would point out that this is very similar to the affect of an *interface* changing in an OO app. Tables *are* an interface.

      (Hiding changes via database views and triggers veries per vendor. The products could probably improve here, but there is no in-born limit of the paradigm which prevents them.)

      (* I have never seen relational programming advanced as a general-purpose paragigm for software construction, so I'll find it interesting to investigate. *)

      I don't know if it is general purpose, it just seems to work well for custom biz apps. One-size-fits-all is probably not the case.

      Regarding Design-by-Contract, it is hard to implement such for many types of business rules. It takes more code to state the contract than it does to implement it in many cases. You end up have to change 2 things instead of one when new requirements come: the implementation *and* the contract verification code. Thus, you increase the chance of errors. It often violates the once-and-only-once rule of factoring.

      The stack DBC examples in the books don't seem to extrapolate to real-world requirements very smoothly. (I stopped using stacks when decent databases came along. A "stack" is simply one of many possible views of any collection. IOW, "Has-a" stack view instead of "is-a" stack.) Good abstraction is all about managing relativism IMO.

    7. Re:Them er fightin' words by p3d0 · · Score: 1
      Regarding Design-by-Contract, it is hard to implement such for many types of business rules. It takes more code to state the contract than it does to implement it in many cases.
      This is the same objection as "how do I get from here to there without goto". If you design your application with contracts in mind, your contracts are never so complicated as to become a burden. Likewise, if you design your app with relational tables in mind, those probably tend to stay relatively simple too.
      You end up have to change 2 things instead of one when new requirements come: the implementation *and* the contract verification code. Thus, you increase the chance of errors. It often violates the once-and-only-once rule of factoring.
      Yes, this is often stated as a flaw in Design by Contract. I disagree with it. Firstly, there is no contract verification code; only the contract. When a modification affects a contract, you're in for a lot of trouble (even more so when you're not using DbC, and therefore may not realize that the contract actually has changed), and the effort of modifying the contract itself reminds a programmer of that. Contracts only need to be changed for exactly those situations in which the work of changing the contract itself vanishes in comparison with the labour required to propagate that modification to the rest of the system.

      There are a number of other reasons I disagree with your assessment:

      1. The contract and the code do not say the same thing, except for trivial cases. Unfortunately, the kinds of cases shown to beginners must be trivial, so that is all they see.
      2. In a system designed using DbC, the contracts are far simpler than the implementation. Let's say they're 5 times simpler. Then, even assuming they are entirely redundant (which they aren't), that's only a 20% growth in code size. It's well worth it for DbC's benefits.
      3. Finding a solution to a problem is generally harder than demonstrating the solution to be correct. This is the basis for the conjecture that P != NP. It is also the basis for DbC: namely, contracts are easier to write than programs, and can demonstrate that a program is working correctly.
      4. Whatever redundancy there is is a good thing in this case. Stating certain things twice, in two different ways, and having the computer check them against each other, helps to locate errors. The same argument regarding redundancy could be used against type annotations, or variable declarations, or even multi-letter variable names, but most would argue that these kinds of redundancy help program correcness, rather than hindering it.
      The most convincing argument in favour of DbC, I think, is Bertrand Meyer's "law of the excluded miracle": if the author of a class/procedure/module doesn't know what it's supposed to do, then the odds that it will do it properly are vanishingly small. Preconditions and postconditions are nothing more than a precise way to specify what something does.
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    8. Re:Them er fightin' words by Tablizer · · Score: 2

      (* In a system designed using DbC, the contracts are far simpler than the implementation. *)

      Maybe in scientific computing where the interface is simple, but the computations are complex. However, biz apps tend to be the other way around. (Biz apps tend to be complex in the way that multiple things interact and the biz rules can reference.)

      (* Preconditions and postconditions are nothing more than a precise way to specify what something does. *)

      Try comments. Well-worded comments are not going to beat the usefulness of some machine-readable notation precisely because it is tuned for the machine instead for people.

      (* The same argument regarding redundancy could be used against type annotations.... *)

      I can live without those. I tend more toward scriptish langs anyhow these days.

    9. Re:Them er fightin' words by p3d0 · · Score: 1
      Maybe in scientific computing where the interface is simple, but the computations are complex. However, biz apps tend to be the other way around.
      That does not match my experience, but let's assume you're right. Then, the fact that an interface is complex makes it that much more important to document it in a rigorous way; and the fact that this is difficult to do certainly doesn't mean that it shouldn't be done.

      I have applied DbC successfully to business apps and system software. I have never written any scientific software, so I can't comment on that.

      Preconditions and postconditions are nothing more than a precise way to specify what something does.
      Try comments. Well-worded comments are not going to beat the usefulness of some machine-readable notation precisely because it is tuned for the machine instead for people.
      I'll ignore the freudian slip, and assume you meant that well-worded comments are going to beat the usefulness of assertions. In that case, I disagree with that too:
      • Nobody ever said assertions in DbC need to be executable. DbC is a design methodology, based on the principle that you should know what the parts of your system are supposed to do. I use DbC when I write my C code, and my contracts take the form of comments.
      • Forcing comments to be executable makes them less ambiguous. In this way, Eiffel-style assertions are often preferable to the comments that usually pass for interface documentation.
      • Executable contracts are continually double-checked against the implementation code to make sure they agree. Comments can drift and become inaccurate over time.
      The same argument regarding redundancy could be used against type annotations...
      I can live without those. I tend more toward scriptish langs anyhow these days.
      Good for you. The point of my comparison with type annotations (plus, more importantly, variable declarations and multi-character variable names, which you have ellided) was that redundancy is not always bad. The redundancy argument could also be used against comments; I hope you won't argue that comments are contrary to the "say it once" principle?
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    10. Re:Them er fightin' words by Tablizer · · Score: 2

      Besides,

      Validation checks can be made with simple IF statements.

      If not inRange(...) then
      panic_or_something
      end if

    11. Re:Them er fightin' words by p3d0 · · Score: 1
      No, for two reasons:
      • To add this kind of checking code everywhere throughout the system would be prohibitively slow, even if the errors you are checking for never happen. In contrast, once certain bugs are rare enough, assertion checks can be disabled, and no longer add any performance overhead. Furthermore, even if you never disable assertion checks, DbC makes it clear exactly where they are necessary, so you don't end up with duplicate redundant checks.
      • What you have shown is not Design by Contract. DbC is not an implementation technique to check for errors; it's a design methodology to delineate precisely the responsibilities of each class/module/function in a system. Yours is an example of defensive programming, which is basically the opposite of DbC.
      Have a look at some of the Design by Contract literature on the web. I promise, it will be time well spent, even if you don't end up using it.
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    12. Re:Them er fightin' words by Tablizer · · Score: 2

      (* In contrast, once certain bugs are rare enough, assertion checks can be disabled, and no longer add any performance overhead. *)

      So it is slow the first 2 years, before More's law makes it not matter? That is not a very good selling point.

      (* Furthermore, even if you never disable assertion checks, DbC makes it clear exactly where they are necessary, so you don't end up with duplicate redundant checks. *)

      And IF statements are not because they are not weird and funky enough to stand out? That is a silly argument. Besides, you can call the same function each time:

      if Not inRange(...)
      SameFamiliarName("Foo out of range")
      end if

      (* DbC is not an implementation technique to check for errors; it's a design methodology to delineate precisely the responsibilities of each class/module/function in a system. *)

      Yeah yeah. I have had this argument before, and how DBC is so *subletly* different that it does not really matter.

      Use what is already available and stop adding goofy little syntax to a language to make it funkier and funkier. Reinvent something really different, not a glorified IF statement. That is a waste of complexity.

    13. Re:Them er fightin' words by p3d0 · · Score: 1
      Ok, I think I may be wasting my time. I thought you were just unfamiliar with DbC, but it seems to me you have made up your mind to believe that DbC is something it's not, and to argue against it based on faults it doesn't posess. You have taken a gratuitously pessimistic view of everything I say, to the point that some of your comments actually contradict what I have said.

      I hope I'm wrong, and that we can continue to have a rational discussion about this.

      So it is slow the first 2 years, before More's law makes it not matter? That is not a very good selling point.
      Yes, it sure isn't a good selling point if you pull this two-year time frame out of your ass. What if I told you the time frame is more like six weeks? That would be more in line with my experience.
      Furthermore, even if you never disable assertion checks, DbC makes it clear exactly where they are necessary, so you don't end up with duplicate redundant checks.
      And IF statements are not because they are not weird and funky enough to stand out?
      I'm not sure what your point is here. Mine is that you can't disable error checking code unless you know which error checks can safely be disabled. Sure, you can grep for "if", but you need to know the difference between error checks that trap bugs in the program, versus those that catch valid error conditions like user errors.

      For instance, in a C compiler, you should eventually be able to disable internal data structure consistency checks, but you can never disable parse error checks. In most software, such as business apps, the line between bugs and actual error conditions is not so clear.

      The way you tell error conditions from bugs is Design by Contract. To the extent that you can tell these two things apart, you are using DbC, whether you have chosen to do it consciously or not.

      DbC is not an implementation technique to check for errors; it's a design methodology to delineate precisely the responsibilities of each class/module/function in a system.
      Yeah yeah. I have had this argument before, and how DBC is so *subletly* different that it does not really matter.

      Use what is already available and stop adding goofy little syntax to a language to make it funkier and funkier. Reinvent something really different, not a glorified IF statement.

      This kind of logic is hard to argue with. You dismiss my statement that Design by Contract is more than just an IF statement, and then you claim that because it's just an IF statement, it's worthless. Well, I agree that there's no point in adding glorified IF statements to a language, but I can only emphasize once again that DbC is a design methodology. (Why do you think they call it Design by Contract?)

      I have already told you that I use DbC to design C code. C obviously doesn't have any special contract syntax, do I'm not sure how you could believe that DbC is just a syntax issue.

      There are mountains of resources on the internet describing the DbC technique, and if you want to ignore it and argue against a straw man instead, that's your perogative.

      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    14. Re:Them er fightin' words by Tablizer · · Score: 2

      (* Sure, you can grep for "if", but you need to know the difference between error checks that trap bugs in the program, versus those that catch valid error conditions like user errors. *)

      I already described how to do that.

      Another way is with a comment. The advantage of a comment is that you can create more complex "removal schemes". For example, you may not want to remove *all* the checks, but just the most costly ones (CPU-wise).

      if Not inRange(....) // DBC: level_3
      DBCraise("x is out of range")
      end if ....
      if Not inRange(....) // DBC: level_2
      DBCraise("y is out of range")
      end if

      If you rely on built-in stuff, then you cannot add features like that if you want to: you are stuck with whatever is out-of-the-box. In this case, all-or-nothing removal/disable of the checks.

      (* You dismiss my statement that Design by Contract is more than just an IF statement, and then you claim that because it's just an IF statement, it's worthless. *)

      I did not say "worthless". I am saying that you have not justified dedicated syntax.

      DBC is just a round-about, consultant buzzword wallet-draining way of saying:

      "Testing assumptions is a good"

  16. Database normalisation rules. by oliverthered · · Score: 3, Interesting

    Databases and code should be designed in a similar way, for more or less the same reasons. If all the refactoring book people have been recommending seem a bit extreme (even the word refactoring sounds extreme to me, a bit like downsizing grrrr....).

    Try getting a simple DB design book that goes through a normalisation process, it should make for a lighter read.

    Then think about how to apply the process to software(a bit of light thinking)

    The first couple of steps are something like

    separate everything out into discrete chunks

    look at 'keys' and 'indexes' (in source code they are design patterns, data structures the things that tie the chicks together).

    You don't need a 1000 page bible, you need a ten pages of guide lines and good practices and a bit of brain power.

    --
    thank God the internet isn't a human right.
    1. Re:Database normalisation rules. by alienmole · · Score: 2
      data structures the things that tie the chicks [sic] together

      When you find yourself subconsciously writing about "tying chicks together" in a discussion of source code organization, it's time to take a break from the keyboard and go get laid, if you can...

      I agree with you about the correlation between database normalization and code factoring [which is the correct and long-established term, no matter how much you might dislike the term "refactoring"]. However, to get a database into Nth normal form can be done by following some fairly simple rules. Code isn't quite so easy. Books like Fowler's refactoring book cover details, subtleties, and rationales that even above-average developers may miss.

      Also, refactoring is a name for something that programmers have always done anyway. An agreed-on name is better than no name at all, or many non-standard names.

  17. Other recommendations... by PinglePongle · · Score: 2, Informative

    Programming Pearls by Jon Bentley - old as the hills by now (he talks about the location of data on tape....), but full of very good insights into writing "good code"TM.

    You might also like "The pragmatic programmer" - Hunt and Thomas - which is another "meta-programming" book with a lot of ideas and insights you could actually sell to your pointy-headed boss.

    The section on "zero-tolerance" coding is a great "why and when to refactor" argument. There's also a good section on how to design the units of which your software is composed, how to reduce the coupling between those units, and how to test em when (you think...) they're done.

    Nev

    --
    It's all very well in practice, but it will never work in theory.