Organizing Source Code, Regardless of Language?
og_sh0x queries: "I'm looking for a source of information dedicated to organizing source code. I see a lot of books and other resources covering syntax and various syntax-related philosophies, but I can never seem to find a good resource for organizing source code in general. For instance, at what point do you split that massive source file into multiple files? At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function? These are problems that plague many programming languages. Are there such resources that cover these issues?"
is called 'Experience' part of your CV.
I've yet to find a simple way to determine any of those. It's just that feeling when you get while looking at the code 'damn, not again..'.
fucktard is a tenderhearted description
Programming C++'s first couple of chapters discuss this very topic.
I have been pwned because my
www.refactoring.com - or any other good refactoring books should help loads to get you started. but there's nothing like experience :)
There are no simple answers to these questions. Best you can do is to formulate your own policy, and stick to it. In real life projects there will always be exceptions and special cases, but it helps a lot if all people working on the project at least know of the existence of common guidelines, and preferably understand and agree with the reasoning behind them.
In Murphy We Turst
The book then goes on to describe the various types of "abstract smells" and what sort of correctional techniques can be considered to correct them, for example:-
I have frequently found that just reading through this short (~15) collection of abstracted "smells" gives a very good way of supplementing the "experience" that you speak of and helping you to make decisions with the benefit of a) a bit of third party support in making these decisions and b) a clearly defined set of rules as to how to apply each of the refactorings including test cases to prove that the functionality has not been changed in the process and, more importantly, a clean roll-back procedure for those times when the olfactory senses get a little bit confused...
The only Good System is a Sound System
Yes, McConnell is a Microsoft guy, but this book is completely operating-system and programming-language agnostic (even though examples are in C, Fortran, and Pascal, IIRC). It is an excellent guide to software construction, covering every aspect from design, over coding practice, style issues, to project management. I highly recommend it.
These sound like the wrong questions to me. It reminds me of someone's (perhaps Dijkstra's?) story of the response he received when he recommended abolishing gotos. Someone said "ok, I'll buy that; so what do I do if I'm at this point in the program, and I want to get to that point?"
The trouble with such a question is that it has no answer. Dijkstra's argument was not that one should take existing programs and remove the gotos; rather, that programs written using only structured elements (sequencing, conditionals, loops) are more comprehensible, and don't require any gotos because there is a more elegant way to achieve the same effect. Thus, as you can see, there really is no answer to the question; the questionner's approach was fundamentally flawed.
Likewise, software organization is not done in terms of functions; rather, it is done in terms of information-hiding modules. To ask when one huge function should be split into to, or when two similar functions should be merged, indicates to me that the design might be flawed. Sometimes that's unavoidable; for instance, if you are involved in a project written by someone else. In that case, you do indeed need to make this kind of decision.
However, true modular programming does not mean taking huge lumbering hunks of code and splitting them into modules. It means writing modules using the principles of information hiding to avoid making huge lumbering hunks of code in the first place.
This, of course, is easier said than done. It's not that hard to avoid gotos, because the use of Dijkstra's structured programming techniques makes them unnecessary. In contrast, writing good modules is hard, and without superhuman foresight, some modules are bound to be pretty crummy. These will need to be rewritten in order to achieve good information hiding properties.
So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide. Modules for which you can't find a succinct statement (12 words or less, with no ifs, ands, or ors) of what information they hide are poorly designed, and need an overhaul. A symptom of this may be that your functions are redundant, or too long, but the core problem is one of poor module design.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
You do it as soon as you notice the problem. If you have good tools, it will be simple and fun (yes, fun).
A refactoring browser like IDEA from IntelliJ makes it simple. Hilight a few lines of code, choose "Extract Method" from a menu, and the code is extracted into a new method with all the necessary parameters created and passed in and the necessary return type and assignment created. For example:
Hilight the expression afther the "+=" online 2 and extract method, calling it "foo":At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function?
It also has a rename feature which will rename a method or variable and change all references to it, but doesn't change references to different variables or methods that happen to have the same name.
It has lots mroe features, but you can read about them for yourself and download the program and play it.
There are other refactoring browsers out there too, like the free Eclipse from IBM. With the right tools, you can easily make your code less messy.
Don't let the title fool you--although he uses C++ for his examples, the concepts he talks about (splitting code into components, why each component should be in its own file, levelization of components, etc.) make sense in any OO language.
I consider this book a must-read for anybody working on large programs.
You refactor when you find you have bad coupling. This is the same criteria that has been used since back in the dark ages of structured, procedural programming. But hey, guess what OOP is? It's a way of organizing your procedural code, with some assistance from the language/compiler to help enforce access policies.
This Refactoring book everyone is mentioning is a good one, but I suspect that its authors understand the old-fashioned concept of coupling so deeply that they don't realize that it's a mystery to many readers. That's old-fashioned in the sense that the law of gravity is old-fashioned, or hemoglobin is old-fashioned, or DNA is. Being old-fashioned is a prerequisite of being an essential underpinning of everything.
I would like to recommend an article by Stevens, Myers, and Constantine with the unassuming title of Structured Design - sounds almost Victorian in a quaint, old-fashioned way, doesn't it? But it's almost certain that none of you youngsters will have that 1974 issue of IBM Systems Journal on your shelves, and last time I checked Yourdon's Classics in Software Engineering was long since out of print. Perhaps you can find some of the books that came in later years as the authors developed their ideas. But the core is all here, in this early work.
Another fine title for avoiding the programmerly version of Santayana's Curse, which I fear may be little easier to find than Classics, is P. J. Plauger's Programming on Purpose. It touches on these same issues as well as many other design techniques which, I fear, all too many of you do find yourselves doomed to reinvent - badly, as the wag said about those who have reinvented Unix over the years.
--
Mama, don't let your children grow up to be slashdotties
Before recommendations are made, I'd say congratulations that you're thinking of this. I'd further congratulate you if you work for someone who'll let you make such improvements. In my company, there's always some "business case", policies, politics, and other reasons for not making improvements.
Nevertheless, I'd recommend a good book on structured analysis and design, then go to a design patterns book, perhaps. The SA&D is probably more micro and the patterns is probably more macro in nature today. I'd do this if you needed "justification" for what you were doing.
Although not directly related, learning basic database concepts wouldn't be bad either. I think normalization concepts might effect your thinking in a positive way. Not only for organizing data structures, but also where you place files in your build structure. Maybe it'll help you avoid needless redundancy. At least these are based on more sound mathematics, and the knowledge thereof will last.
I think the dirty little secret towards these books is that much of it is written opinion, with little scientific evidence for its reasoning. In computers today, much credibility comes more from writing about something than proving something. Not entirely, perhaps, but if it were really gospel, then we wouldn't have displaced SA&D with OOA&D and patterns. Develop your own style, be consistent, and stick to your guns, because your opinion is probably no worse than the others.
Java forces you to make each file a different object. Then comes organizing all your files into packages. For this, we use patterns (like model-view-controller pattern). The higher level after patterns is application specific.
Ahh, the joys of OOP...
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
I also like John Lakos' Large-Scale C++ Software Design. Yes, it is quite C++ specific, but this books has a unique focus on the the physical design on your software. Lakos describes how to organize your project files to minimize dependencies, reduce compile-time, and improve developer productivity.
cpeterso
(* design module interfaces not according to what services they provide, but what information they hide. *)
Sounds like a hidden ad for OO thinking.
oop.ismad.com
OOP has never been proven to be objectively superior, neither WRT code size, nor reuse, nor less change under change-impact analyses. (Except possibly in a few narrow domains.)
The trick to procedural is good table schema design IMO. In 70's they didn't know about this when they started bashing procedural designs and promoted OO as a solution.
Table-ized A.I.
Databases and code should be designed in a similar way, for more or less the same reasons. If all the refactoring book people have been recommending seem a bit extreme (even the word refactoring sounds extreme to me, a bit like downsizing grrrr....).
Try getting a simple DB design book that goes through a normalisation process, it should make for a lighter read.
Then think about how to apply the process to software(a bit of light thinking)
The first couple of steps are something like
separate everything out into discrete chunks
look at 'keys' and 'indexes' (in source code they are design patterns, data structures the things that tie the chicks together).
You don't need a 1000 page bible, you need a ten pages of guide lines and good practices and a bit of brain power.
thank God the internet isn't a human right.
Programming Pearls by Jon Bentley - old as the hills by now (he talks about the location of data on tape....), but full of very good insights into writing "good code"TM.
You might also like "The pragmatic programmer" - Hunt and Thomas - which is another "meta-programming" book with a lot of ideas and insights you could actually sell to your pointy-headed boss.
The section on "zero-tolerance" coding is a great "why and when to refactor" argument. There's also a good section on how to design the units of which your software is composed, how to reduce the coupling between those units, and how to test em when (you think...) they're done.
Nev
It's all very well in practice, but it will never work in theory.