Code Reading: The Open Source Perspective
A few books are tackling this subject, including Coder to Developer and Programming Language Pragmatics. These books don't teach you much about a particular language in the way that an introductory text would. Instead, you grow as a skilled developer by studying them and learning from them. That's one of the key things that people are talking about lately, that to be a strong developer requires more than a working knowledge of a language. It requires a familiarity with the strengths, weaknesses, and core features of a language and the base libraries to be efficient.
Code Reading: The Open Source Perspective is one of these books in this small but growing library. In it, Diomidis Spinellis takes you through a large body of code and focuses on several languages, techniques, and facets of development that differentiate strong developers from weak ones. What I like about this book is how much it covers, how practical the information is, and how much Spinellis teaches you. You wont learn a language, which is the complaint of some people who read this book, but if you know one or two you'll be a better programmer.
Perhaps one of the most telling things about the book is that it draws heavily from NetBSD source code, and features over 600 examples to make the point. Examples are often annotated using NetBSD as a reference. This makes sense, because NetBSD is a large project that's relatively stable and mature. Everything from how to define a C structure consistently and sanely to UML diagrams and build systems are covered, making this truly a developer's book. However, even Windows and Mac OS X developers will benefit, despite the BSD focus.
Chapter 1 introduces some of the basic tenets of the book, namely that code is literature and should be read as such. All too often people only read code when they have a specific problem to solve or want to get an example of an API. Instead, if you read code frequently you'll always be learning things and improving your skills. Also, Spinellis discusses the lifecycle of code (including its genesis, maintenance, and reuse), which simply must be taken into account if code is to be good. Poorly skilled developers forget these things and just slap it together, never thinking ahead.
In Chapter 2, a number of concepts basic to any programming language are covered, including the basic flow-control units common to many languages. The book focuses on C, with additional coverage given using C++, Java, and a few other things thrown in for good measure. As such, these chapters -- in fact the whole book -- focuses on concepts common to these languages but absent in some other languages, like Scheme or LISP. One neat section is called "refactoring in the small." It illustrates the real value of the book nicely, in showing you various ways to organize your code and your thoughts for various effects. Oftentimes a book will only teach you one way (which doesn't always suit your needs), and Spinellis' examples do a nice job of escaping that trap, not just here but throughout the book.
Chapter 3, "Advanced C Data Types," focuses on some language-specific matters. These are pointers, structures, unions and dynamic memory allocation, things that most people who code in C may use but only some truly understand well. Again, a somewhat basic chapter, but useful nonetheless. Make sure you read it; chances are you'll learn a thing or two.
In Chapter 4, some basic data structures (vectors, matrices, stacks, queues, maps and hash tables, sets, lists, trees and graphs) are covered. This is an important chapter since it helps you see these structure in real-world use and also helps you understand when to chose one structure over another. While Knuth, CLRS, or other algorithms and data structures texts cover these, they often do so in isolation and at a theoretical level. While their coverage is short, it's to the point and usable by anyone with a modest understanding of C.
Chapter 5, "Advanced Control Flow," the last chapter that deals with actual programming information, is another useful one. Again, short but to the point, this chapter covers things like recursion, exceptions, parallelism, and signals, all topics that have warranted their own books (or major sections in other books) but which are covered in a single chapter here. Still, seeing them side-by-side and in the context of each other and in real-world use provides some justification for the compact presentation.
The remaining chapters of the book go well beyond a normal programming book and focus on projects. These chapters complement the first bunch nicely by focusing on the organization of your code and projects. Chapter 6 deals specifically with many of the commonly identified (but rarely taught) things like design techniques, project organization, build processes, revision control, and testing. A number of things that aren't covered include defining and managing requirements for a release and their specifications, basics on how to use autoconf and automake, and instead rips through a whole slew of topics quite quickly.
Chapter 7 is sure to be controversial for some people: it covers "Coding Standards and Conventions." Some people seem to be big fans of the "if it feels good, do it" style of programming, and instead of writing sane, usable code, what they produce is buggy and messy. This chapter teaches you tried and tested methods of naming files, indentation (and how to do so consistently using your editor to help), formatting, naming conventions (for variables, functions, and classes), as well as standards and processes. The style and standards are (as you would expect) based on NetBSD, which differ slightly from GNU and Linux standards, as well as commonly found Windows practices. However, I think you'll agree that the style is readable with minimal effort, and that goal, coupled to consistency, is paramount in any standard.
Chapter 8 introduces you to documentation, including the use of man pages, Doxygen, revision histories, and the like. Also included are hints at using diagrams for added value. One thing I don't like about this chapter is the opening quote, which sets a bad precedent. It blithely suggests that bad documentation is better than none, which is highly questionable. Misleading docs can be worse than no docs at all, since someone without docs will have to dig through the code in front of them to understand it. Someone with bad docs will rely on the docs and wonder what's broken when things go awry.
Chapter 9 focuses on code architecture, such as class hierarchies, module organization, and even core features like frameworks to chose. This chapter covers a lot of material, and is, despite its size, simply too terse on many of these subjects. It serves as a decent introduction, but doesn't go very far in some places, considering the importance of the material. However, like much of the book, it's a good introduction to the topics at hand.
Chapter 10 also features a lot of good things to know. Granted, you could pick them all up with a lot of hard work and scouring for information, but it's easier to have them presented to you in a cohesive format. The chapter discusses code reading tools, things that you use to help you dig around a large body of code. One you get over a few source files, even if you have well-organized code and interfaces, many changes can require that you inspect the data path. You can do this manually, or you can be assisted with tools. Tools like regular expressions, grep, your editor -- Spinellis shows you how to make use of all of them when you write code. A lot of tools I've never used (but have heard about) are featured, and their use is demonstrated, but of course many tools are simply ignored, focusing on popular ones that will work for most people.
Finally, all of the above is brought together in Chapter 11, "A Complete Example." A small tour of a large, complex piece of code is taken (34,000 lines of Java) as the author makes changes. It's unfortunately in Java, when so much of the book focused on C (why couldn't they have been consistent examples?), but it works. The example itself could have covered a few more things, such as a proper JUnit example, but overall I'm pleased with it.
Overall, Code Reading: The Open Source Perspective is ambitious and worthwhile, both as a complement to a bookshelf of study that includes The Practice of Programming and Design Patterns, and to someone who is growing tired of books on learning a language. At times it feels like the author promised more than he wound up delivering, but it serves as an introduction to a large number of topics. You wont learn a language, and you wont be able to get as much out of the book if you don't engage it with practice, but it's a useful book to get started on the road from being someone who knows a language or two to someone who is a developer, ready to contribute to a team and work on large projects. Never underestimate the skills required to be a good developer, because they go well beyond knowing how to use a language.
You can purchase Code Reading: The Open Source Perspective from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
When I see that develop in someone, then I know they are going to the next level. When I don't see it after time, I know they will never evolve.
This is very true. While learning a certain language I always have many inconsistanies in my code. In fact I always want to go back and change it as I progress. It just shows that experince is much more than talent.
You can tell a great deal about the maturity of a programmer by the quantity, and quality, of comments.
The amount of comments in code is interesting. It kind of starts at an extreme, then moves slowly to the middle, happy land.
If you started programming, you either chose to comment a lot, or not at all. "A lot", because you were never sure what your code was doing, and always needed the reference. Or "not at all", because you could make sense of the code very well.
Then you start your first big project. For the big commenter, he realizes after getting quite a bit of work done that a lot of time was "wasted" on comments. ("// This line increments the 'i' variable by 1".) Further, these comments seem way too obvious. For the non commenter, they get lost easily and wish they commented more.
So it goes in the opposite direction from when it started. The big commenter comments way less, and the non commenter comments way more.
Then it repeats. Eventually the programmer gets the happy medium of the right amount of comments.
This has been my experience, in my own code, and checking other people's code. Thoughts?
Yes. Getting it done right.
Tarsnap: Online backups for the truly paranoid
The smart man learns from his mistakes ...
... but the wise man learns from other peoples mistakes.
:-P.
I guess that I am just kind of average
Cheers,
Adolfo
I disagree when writing Java and C#. These languages are inherintley readable if you write clean, understandable code with good variable names. Comments should be used whey you implement something that you can't understand by reading the code. Its quickker for me to read clean code, and comments often get in the way and get outdated and hurt more than they help. Maintainance is the Iceburg.
Most critical is managing complexity. Large, complex functions are bad - they have more bugs, they are harder to maintain, harder to bug fix (a change has more likelyhood of breaking something else). If a function has grown beyond a hundred lines of (real) code, it is almost certainly too large. If it has more than 4 levels of nesting, it is too large.
Comments also matter. It's easy to code a couple of thousand lines of fresh code over a weekend if you get in the groove. It is almost impossible to unpick it one year later if you didn't comment it as you go.
Variable and function names should be expressive. No single letter variable names! No obscure combinations of letters like words with no vowels (fnct could be function or function control, or even Function Numerical Constant Type). And personally I find that reverse Hungarian notation can be more trouble than it's worth. puiAnnoying!
Build in automatic checks on everything. If a pointer to a function should never be null, check it and stop if it is. If a variable should only have values 1 -> maxIterations, then check it. If you (or anyone else) ever breaks that assumption, the code will flag it for you.
Beyond that, nothing beats good design, especially designs where extending the original work is easy. So many designs end up as tangle knots of conflicts because they ended up trying to solve problems that the original code base never envisaged.
Cheers,
Toby Haynes
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
Actually, the parent is neither off-topic nor flamebait. This book can be very good (and I think I'll buy it if I can) but this reviews lacked a comparison with another "Bibles" such as "Code Complete" (a Microsoft Press book!). Code Complete was to me the book that gave me another level of programming, exactly like this "Open Source Code Complete". The problem is really with the word "open source embedded in the title...
Hmmm. What concepts, particularly what basic flow control concepts, are present in C, C++, or Java yet absent in Scheme or Lisp?
If the expression itself requires a comment, then it is not being expressed clearly enough. It should be rewritten.
If the expression is clear, but does not make sense in the context, appears suboptimal, or otherwise thwarts the reader's expectations, THEN I'll comment.
Real world project comments...
There is a lot of truth to coding style that is not necessarily related to open source. Seasoned programmers work very hard to keep naming, patterns and even formatting consistent. It makes it easier to read.
:)
The open source angle is that the code will be viewed by many people and thus there is a greater detail to presentation, it's a form of ego stroking and seems to work quite well.
I do code reviews all the time and I can tell a novice developer from someone who has done it for a while, it's a hard thing to explain, it's like a difference between good and bad art. And just like with art, you can have decent artists and you can have great artists with very little in-between. Go figure
Having worked on some terribly buggy code with buggy docs, I'm going to suggest that even bad docs are better than none. Sure, you get burned once or twice, but you quickly learn not to trust the docs as canon. And while the docs often don't tell you what the code does, it tells you what the developer WANTED it to do, which is golden information when you're trying to debug.
Acius the unfamous
I have to say that I agree with you to a certain degree.
I'll go one more step though. It's about white space. You can almost always recognize a good programmer by how clean the code looks because of proper use of white space. If the code is all crammed together and cluttered looking then you know that programmer sucks. White space does not make your code slower guys!
Note that I said this is a way to recognize a good programmers. Often experience plays a role but I see a ton of fairly experienced programmers that write ass crappy code.
I see a lot of people saying comments are also a sign but I have to disagree. Well written code is easy to read and figure out. I would much rather have nicely written code than comments. (Note that it is my job to fix other people's code so I have a lot of experience with this)
The ratio of people to cake is too big
After spending many years as a programmer, and writing thousands of lines of code, I have learned so much about coding that these days I find myself not writing code at all, or very little code.
As a young freshly trained programmer, you are walking around with a hammer, and a lot of things look like nails.
You think you are a badass because of the power and precision with which you hit the nails.
Then one day and you see you are building a house. And there are a lot of other people whose work is instrumental to getting the house built.
So after a while you start telling other people how to hit their nails, and before you know it, you are building entire tracts of homes.
If you encounter more than about 4 levels of indentation then the width of a tab is the least of your worries.
One of my favorite aspects of Objective-C is the way methods are self-documenting. "[Object doThis:something toThis:something]" is easy to figure out.
// Begin Comment Header
// Comment Author:
// John Smith
// Programmer Analyst
// john.smith@company.com
// Comment Version:
// 0.0.1
// Comment Date:
// 2005.03.08
// End Comment Header
// Begin Comment
// increment i by 1
i++;
If the code doesn't match the documentation, then you have found a bug! Either the documentation or the code is flawed: you need to revisit the actual requirements to determine which is the case. This may require revisiting the problem at it's source to determine the business, hardware, or safety requirements. When you find out the required behaviour, code it correctly, and then document it correctly!
If you blithely rely on either the code *OR* the documentation to be correct, you're being far too trusting...
This style
{
is better by far
{
because the braces 'belong' to
the following block of code
}
not the
{
preceeding guard statement
}
}
Sincerely,
An OFP (Old Fart Progammer)
Generally, bash is superior to python in those environments where python is not installed.
All I gotta say is that if you use {
this {
style {
of indention
}
}
}
You are a newbie.
No way! I have been coding for 30 years...
It makes more sense to code
if (test) {
statement1
} else {
statement2
}
because the eye can immediately see from the 'if' and 'else' statements alone that a block of conditional code follows. Having the braces on the same line also marks out the lines as special: flow control code. I find that
if (test)
{
statement1
}
else
{
statement2
}
simply adds pointless extra lines to code, with no gain in readability.
I've tried to work on projects where we developed new software as much as possible. However, taken all together I think I've probably spent many months of aggregate time reading other peoples code, modifying it, adding new features and fixing bugs. I hate doing this work, not only because of poorly designed software but because most software I've worked on is almost entirely undocumented (e.g., no comments and no external documentation).
Some posters have mentioned comments and documentation in this thread. But in my opinion, it has not been discussed in strong enough terms. For a long rant on this topic, see my essay Software and Documentation. I'll summarize some of the points I try to make in this essay here.
There were only a few people in this thread who wrote that they think that software source code is self documenting. This is a good sign. In the past this was a widely held view. Perhaps it is becoming less popular. Perhaps pigs are becoming airborne.
To those of you who wrote that your code is self documenting: you are simply lazy and unprofessional.
Sure I can understand what well written code does. But I cannot understand why it does what it does. I can show you three hundred lines of wavelet signal processing code. You can immediately understand what the code is doing. But you will have no clue what-so-ever about why it does what it does unless you understand the thinking behind the code. The same is true of any complex software.
Another problem is that even though I might understand what components do and perhaps even why they are doing what they do, getting an overall understanding of the architecture of a large application can be difficult and time consuming.
By not writing comments in your code and, for large applications, providing external documentation on the architecture, you are saving your time and effort, but forcing others to spend effort that you could have saved them.
Even if you have no consideration for anyone else who must maintain the application after you write it, comments will help you find bugs in your own code. Again and again I have found that when I explain my code in a comment I am forced to look over it again. In doing this I have found that the code does not do what I intended or that I left something out.
When it comes to documentation there is rarely enough. Even if you make an effort to document your code, when you are in the middle of it you will not document things that you find obvious. When you return later, these obvious issues will have been forgotten and are not obscure.
Documentation must be maintained along with the code. And it should be extended where it is incomplete or unclear. English (or what ever your language is): it's another kind of software.
Developers seem to fall into two camps: those who believe documentation is important and who do it and those who do not. I've managed software groups where I mentioned documentation in people's performace reviews and it still did not good. So far I have not been able to get someone who does not document to do it.
I have thought that perhaps one solution would be to hold code reviews where one of the major features that was looked at was understandability in the code. But I have not experimented enough to know if this will actually get people to document their software.
Literate programming not only involves writing well designed and well informed algorithms. It also includes writing documentation that explains these algorithms.
From Structure and Interpretation of Computer Programs by Abelson and Sussman:
"Programs should be written for people to read, and only incidentally for machines to execute."
I find it useful to tell why I am doing something. A decent programmer can figure out what the code is doing, but even I have trouble 6 months later figuring out just why exactly it was that I did this, when this seems to be a better solution. Then I try the "better" solution, immediately realize why I went with the one I did. Saves me a lot of time in the long run to just document right there I did this b/c of that.
So you need guidelines to make a bad design decision work. This is what is generally called a "hack". The real solution is to not give semantic meaning to whitespace in the first place.
I still have more fans than freaks. WTF is wrong with you people?
Both consistency and good comments are good traits. I find understandable function and variable names are essential as well.
I believe that language familiarity has little to nothing to do with being a good programmer. A more familiar programmer can program faster but programming principles remain the same across all languages.
Smaller, simpler code is inherently less likely to contain bugs, runs faster, and is easier to troubleshoot.
If programmer A develops 1000 lines of code per day and programmer B develops 10 lines of code per day and both programs do the same thing I will pick programmer B every time (normal lines of code, not contest entries with runon lines and things)
Coding Blog
When I programm ERP Modules (in Python) I start with a large comment on what the module is supposed to do. I actually started doing this quite from the beginning. I write a litte summary of what the module does and comment heavily throughout the source. I even have debugging sessions were I only work on comments - often detailing them further.
I also do a lot of other programming (ActionScript rich media and stuff). I don't do opening or other comments there that much.
Why that?
If a multimedia app doesn't work I get an e-mail or a call asking to check into it within the next 24 hrs. The customer is anoyed or says that this comes bad with the end-user. We talk a little, I et to work and fix it within the next 10 hrs.
If a ERP module doesn't do what it's supposed to, I get a call on my mobile, with the CEO at the other end telling me that he's losing 200 Euro an hour because order processing has gone haywire.
This is a good point to end all discussion wether commenting is good or bad. When you are debugging a piece of code that keeps 3 people breathing you shure as well want it to be well commented.
The Lockheed Software Group, the people who write code for the space shuttle and other things, take this to the all-out extreme. They write EVERYTHING in a sort of human readable meta code, meta-comment THAT and then tranfer that to real code. They've had 3 bugs in 25 years - so I've heard. One of them being a counter problem that once would have nearly locked down a com-dish on an orbiter and prevented it from following it's target on earth after having rotated a certain amount. Figure: A millisecond of timing error in their code has the orbiter off course by 3 miles. I hear their comment line/code line ratio is about 7 to 1.
So much for "overcommenting is unprofessional".
We suffer more in our imagination than in reality. - Seneca
To think of code as literature is actually a non-obvious and striking idea.
The essence of literature is communication. I've been writing C and C++ for years, constantly fine-tuning and refining code and documentation, line by line, paragraph by paragraph. And how apropos to realize that the point is to express the thought so clearly and concisely that any reader (including yourself three weeks from now) will grasp it as close to immediately as possible.
That means you have to think clearly in order to code clearly. This matters more than almost anything else.
Too much documentation is just as bad as too little, since it obscures the intent of the code. How much is just enough? It takes years of thoughtful practice to know that.
Everyone is posting their favorite rules about coding standards. Most young programmers apply standards as rigid as they are incomplete. I have only two rules:
1) Call everything by its right name. Functions are verbs, variables are nouns. The name you give a object should express what it is or does, no more and no less, and be readable in English (or whatever your spoken language is). If you add something to a function that doesn't fit the function's name, it either belongs in a different function or else you misnamed the function in the first place. This takes constant review of your code, from a literary point of view.
2) Rules are made to be broken. All great writers unhesitatingly break the rules in order to communicate more clearly.
Languages are for communicating, and computer languages are no different. Keep that in mind at all times, and you'll become a wizard.
Hence, simple algorithms will likely have no comments in the implementation.
More of the time, comments that "supplement" the code usually add to the confusion for most advance developers.
Have you ever been in a code review that was a comment review? Where the reviewers, unwilling or unable to find problems in the code, nitpicked about how the comments looked like? Too many managers and second tier programmers spend their time worrying about the details of the comments and coding style because that is all they understand.
What good are great comments if the code doesn't work? Sure, comments matter, but all of the comments in the world aren't going to make buggy software run better. It will just make the next programmer understand how screwed up the original programmer was.