Organizing Source Code, Regardless of Language?
og_sh0x queries: "I'm looking for a source of information dedicated to organizing source code. I see a lot of books and other resources covering syntax and various syntax-related philosophies, but I can never seem to find a good resource for organizing source code in general. For instance, at what point do you split that massive source file into multiple files? At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function? These are problems that plague many programming languages. Are there such resources that cover these issues?"
is called 'Experience' part of your CV.
I've yet to find a simple way to determine any of those. It's just that feeling when you get while looking at the code 'damn, not again..'.
fucktard is a tenderhearted description
Programming C++'s first couple of chapters discuss this very topic.
I have been pwned because my
There are no simple answers to these questions. Best you can do is to formulate your own policy, and stick to it. In real life projects there will always be exceptions and special cases, but it helps a lot if all people working on the project at least know of the existence of common guidelines, and preferably understand and agree with the reasoning behind them.
In Murphy We Turst
The book then goes on to describe the various types of "abstract smells" and what sort of correctional techniques can be considered to correct them, for example:-
I have frequently found that just reading through this short (~15) collection of abstracted "smells" gives a very good way of supplementing the "experience" that you speak of and helping you to make decisions with the benefit of a) a bit of third party support in making these decisions and b) a clearly defined set of rules as to how to apply each of the refactorings including test cases to prove that the functionality has not been changed in the process and, more importantly, a clean roll-back procedure for those times when the olfactory senses get a little bit confused...
The only Good System is a Sound System
Yes, McConnell is a Microsoft guy, but this book is completely operating-system and programming-language agnostic (even though examples are in C, Fortran, and Pascal, IIRC). It is an excellent guide to software construction, covering every aspect from design, over coding practice, style issues, to project management. I highly recommend it.
These sound like the wrong questions to me. It reminds me of someone's (perhaps Dijkstra's?) story of the response he received when he recommended abolishing gotos. Someone said "ok, I'll buy that; so what do I do if I'm at this point in the program, and I want to get to that point?"
The trouble with such a question is that it has no answer. Dijkstra's argument was not that one should take existing programs and remove the gotos; rather, that programs written using only structured elements (sequencing, conditionals, loops) are more comprehensible, and don't require any gotos because there is a more elegant way to achieve the same effect. Thus, as you can see, there really is no answer to the question; the questionner's approach was fundamentally flawed.
Likewise, software organization is not done in terms of functions; rather, it is done in terms of information-hiding modules. To ask when one huge function should be split into to, or when two similar functions should be merged, indicates to me that the design might be flawed. Sometimes that's unavoidable; for instance, if you are involved in a project written by someone else. In that case, you do indeed need to make this kind of decision.
However, true modular programming does not mean taking huge lumbering hunks of code and splitting them into modules. It means writing modules using the principles of information hiding to avoid making huge lumbering hunks of code in the first place.
This, of course, is easier said than done. It's not that hard to avoid gotos, because the use of Dijkstra's structured programming techniques makes them unnecessary. In contrast, writing good modules is hard, and without superhuman foresight, some modules are bound to be pretty crummy. These will need to be rewritten in order to achieve good information hiding properties.
So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide. Modules for which you can't find a succinct statement (12 words or less, with no ifs, ands, or ors) of what information they hide are poorly designed, and need an overhaul. A symptom of this may be that your functions are redundant, or too long, but the core problem is one of poor module design.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
You do it as soon as you notice the problem. If you have good tools, it will be simple and fun (yes, fun).
A refactoring browser like IDEA from IntelliJ makes it simple. Hilight a few lines of code, choose "Extract Method" from a menu, and the code is extracted into a new method with all the necessary parameters created and passed in and the necessary return type and assignment created. For example:
Hilight the expression afther the "+=" online 2 and extract method, calling it "foo":At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function?
It also has a rename feature which will rename a method or variable and change all references to it, but doesn't change references to different variables or methods that happen to have the same name.
It has lots mroe features, but you can read about them for yourself and download the program and play it.
There are other refactoring browsers out there too, like the free Eclipse from IBM. With the right tools, you can easily make your code less messy.
Don't let the title fool you--although he uses C++ for his examples, the concepts he talks about (splitting code into components, why each component should be in its own file, levelization of components, etc.) make sense in any OO language.
I consider this book a must-read for anybody working on large programs.
Having a thousand 10 line files does nothing to improve maintainability.
My obligatory plug for The Mozilla Project. Not quite one function per source file, but definitely lots of very small source files, each implementing a very narrow slice of functionality. Mozilla is pretty well factored code, and maintainability is enhanced by the separation of responsibilities. It makes it possible to enhance or fix problems in one area, say the in nsFTPChannel, and know that all the thousands of other lines in the program will be largely insulated from those changes.
Yes, it does take a while to get familiar with the entire Mozilla codebase. The flip side is that you only have to look at and understand a small fraction of it to start becoming productive.
If you are using C++, Large Scale C++ Software Design is definitely a recommendation I can second.
You refactor when you find you have bad coupling. This is the same criteria that has been used since back in the dark ages of structured, procedural programming.
It's always good to see the grey-hairs confirming that what seems new and different and untested is in fact obvious and essential for junior programmers to know. Repackaging it a Refactoring may not add anything new, but it does place it in a context that's more accessible to those not raised on FORTRAN and COBOL. Plus, when the old classics are out of print and hard to find, it's good that the new refactorings of the information are still on the shelves at Amazon.
Java forces you to make each file a different object. Then comes organizing all your files into packages. For this, we use patterns (like model-view-controller pattern). The higher level after patterns is application specific.
Ahh, the joys of OOP...
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
I also like John Lakos' Large-Scale C++ Software Design. Yes, it is quite C++ specific, but this books has a unique focus on the the physical design on your software. Lakos describes how to organize your project files to minimize dependencies, reduce compile-time, and improve developer productivity.
cpeterso
(* design module interfaces not according to what services they provide, but what information they hide. *)
Sounds like a hidden ad for OO thinking.
oop.ismad.com
OOP has never been proven to be objectively superior, neither WRT code size, nor reuse, nor less change under change-impact analyses. (Except possibly in a few narrow domains.)
The trick to procedural is good table schema design IMO. In 70's they didn't know about this when they started bashing procedural designs and promoted OO as a solution.
Table-ized A.I.
Databases and code should be designed in a similar way, for more or less the same reasons. If all the refactoring book people have been recommending seem a bit extreme (even the word refactoring sounds extreme to me, a bit like downsizing grrrr....).
Try getting a simple DB design book that goes through a normalisation process, it should make for a lighter read.
Then think about how to apply the process to software(a bit of light thinking)
The first couple of steps are something like
separate everything out into discrete chunks
look at 'keys' and 'indexes' (in source code they are design patterns, data structures the things that tie the chicks together).
You don't need a 1000 page bible, you need a ten pages of guide lines and good practices and a bit of brain power.
thank God the internet isn't a human right.
Although that characterization does describe a valid benefit of OOP, it completely misses possibly the most important aspect of OOP, which is the introduction of type-based polymorphism.
In fact, the "organizing procedural code" benefit of OOP is simply a side effect of designing systems based on interacting types, something which procedural systems didn't directly support. Saying that OOP is a way of organizing your procedural code completely misses the point.
Modern texts on refactoring focus on factoring issues in these systems of interacting types, and as such are revelant to current systems in a way that it's difficult for e.g. Plauger to be. Certainly, normalizing/factoring/compressing systems has been and always will be a basic goal of software development, but just because the concept is old doesn't mean that there aren't new insights into it. Suggesting otherwise is a little like saying that using Jupiter's gravity to give a spaceprobe an energy boost is nothing new, since Newton discovered gravity. I suspect NASA scientists get most of their information somewhere other than Principia Mathematica.
Programming Pearls by Jon Bentley - old as the hills by now (he talks about the location of data on tape....), but full of very good insights into writing "good code"TM.
You might also like "The pragmatic programmer" - Hunt and Thomas - which is another "meta-programming" book with a lot of ideas and insights you could actually sell to your pointy-headed boss.
The section on "zero-tolerance" coding is a great "why and when to refactor" argument. There's also a good section on how to design the units of which your software is composed, how to reduce the coupling between those units, and how to test em when (you think...) they're done.
Nev
It's all very well in practice, but it will never work in theory.