Code Reading: The Open Source Perspective

← Back to Stories (view on slashdot.org)

Code Reading: The Open Source Perspective

Posted by timothy on Tuesday March 8, 2005 @10:20AM from the because-that's-the-code-you-can-read dept.

nazarijo writes "You can usually tell someone who's been writing a lot of code by how they write code. That may sound like a tautology, but it's got a deeper meaning than that. What editor they use, what idioms they use to avoid common pitfalls, and what organization patterns they employ all tell you what kind of programmer you're meeting. When you first start writing code, so many things are inconsistent and just plain wrong that it's almost embarrassing. I know that when I look over older code that I've written I feel sheepish about it. But how do you grow as a programmer, and what really makes a good programmer beyond language familiarity?" Read on for Nazario's review of Code Reading: The Open Source Perspective, a book which attempts to instill deeper knowledge about programming than just "knowing how." Code Reading: The Open Source Perspective author Diomidis Spinellis pages 499 publisher Addison-Weslet Longman rating 7 reviewer Jose Nazario ISBN 0201799405 summary A tour of large-scale development projects from code to organization

A few books are tackling this subject, including Coder to Developer and Programming Language Pragmatics. These books don't teach you much about a particular language in the way that an introductory text would. Instead, you grow as a skilled developer by studying them and learning from them. That's one of the key things that people are talking about lately, that to be a strong developer requires more than a working knowledge of a language. It requires a familiarity with the strengths, weaknesses, and core features of a language and the base libraries to be efficient.

Code Reading: The Open Source Perspective is one of these books in this small but growing library. In it, Diomidis Spinellis takes you through a large body of code and focuses on several languages, techniques, and facets of development that differentiate strong developers from weak ones. What I like about this book is how much it covers, how practical the information is, and how much Spinellis teaches you. You wont learn a language, which is the complaint of some people who read this book, but if you know one or two you'll be a better programmer.

Perhaps one of the most telling things about the book is that it draws heavily from NetBSD source code, and features over 600 examples to make the point. Examples are often annotated using NetBSD as a reference. This makes sense, because NetBSD is a large project that's relatively stable and mature. Everything from how to define a C structure consistently and sanely to UML diagrams and build systems are covered, making this truly a developer's book. However, even Windows and Mac OS X developers will benefit, despite the BSD focus.

Chapter 1 introduces some of the basic tenets of the book, namely that code is literature and should be read as such. All too often people only read code when they have a specific problem to solve or want to get an example of an API. Instead, if you read code frequently you'll always be learning things and improving your skills. Also, Spinellis discusses the lifecycle of code (including its genesis, maintenance, and reuse), which simply must be taken into account if code is to be good. Poorly skilled developers forget these things and just slap it together, never thinking ahead.

In Chapter 2, a number of concepts basic to any programming language are covered, including the basic flow-control units common to many languages. The book focuses on C, with additional coverage given using C++, Java, and a few other things thrown in for good measure. As such, these chapters -- in fact the whole book -- focuses on concepts common to these languages but absent in some other languages, like Scheme or LISP. One neat section is called "refactoring in the small." It illustrates the real value of the book nicely, in showing you various ways to organize your code and your thoughts for various effects. Oftentimes a book will only teach you one way (which doesn't always suit your needs), and Spinellis' examples do a nice job of escaping that trap, not just here but throughout the book.

Chapter 3, "Advanced C Data Types," focuses on some language-specific matters. These are pointers, structures, unions and dynamic memory allocation, things that most people who code in C may use but only some truly understand well. Again, a somewhat basic chapter, but useful nonetheless. Make sure you read it; chances are you'll learn a thing or two.

In Chapter 4, some basic data structures (vectors, matrices, stacks, queues, maps and hash tables, sets, lists, trees and graphs) are covered. This is an important chapter since it helps you see these structure in real-world use and also helps you understand when to chose one structure over another. While Knuth, CLRS, or other algorithms and data structures texts cover these, they often do so in isolation and at a theoretical level. While their coverage is short, it's to the point and usable by anyone with a modest understanding of C.

Chapter 5, "Advanced Control Flow," the last chapter that deals with actual programming information, is another useful one. Again, short but to the point, this chapter covers things like recursion, exceptions, parallelism, and signals, all topics that have warranted their own books (or major sections in other books) but which are covered in a single chapter here. Still, seeing them side-by-side and in the context of each other and in real-world use provides some justification for the compact presentation.

The remaining chapters of the book go well beyond a normal programming book and focus on projects. These chapters complement the first bunch nicely by focusing on the organization of your code and projects. Chapter 6 deals specifically with many of the commonly identified (but rarely taught) things like design techniques, project organization, build processes, revision control, and testing. A number of things that aren't covered include defining and managing requirements for a release and their specifications, basics on how to use autoconf and automake, and instead rips through a whole slew of topics quite quickly.

Chapter 7 is sure to be controversial for some people: it covers "Coding Standards and Conventions." Some people seem to be big fans of the "if it feels good, do it" style of programming, and instead of writing sane, usable code, what they produce is buggy and messy. This chapter teaches you tried and tested methods of naming files, indentation (and how to do so consistently using your editor to help), formatting, naming conventions (for variables, functions, and classes), as well as standards and processes. The style and standards are (as you would expect) based on NetBSD, which differ slightly from GNU and Linux standards, as well as commonly found Windows practices. However, I think you'll agree that the style is readable with minimal effort, and that goal, coupled to consistency, is paramount in any standard.

Chapter 8 introduces you to documentation, including the use of man pages, Doxygen, revision histories, and the like. Also included are hints at using diagrams for added value. One thing I don't like about this chapter is the opening quote, which sets a bad precedent. It blithely suggests that bad documentation is better than none, which is highly questionable. Misleading docs can be worse than no docs at all, since someone without docs will have to dig through the code in front of them to understand it. Someone with bad docs will rely on the docs and wonder what's broken when things go awry.

Chapter 9 focuses on code architecture, such as class hierarchies, module organization, and even core features like frameworks to chose. This chapter covers a lot of material, and is, despite its size, simply too terse on many of these subjects. It serves as a decent introduction, but doesn't go very far in some places, considering the importance of the material. However, like much of the book, it's a good introduction to the topics at hand.

Chapter 10 also features a lot of good things to know. Granted, you could pick them all up with a lot of hard work and scouring for information, but it's easier to have them presented to you in a cohesive format. The chapter discusses code reading tools, things that you use to help you dig around a large body of code. One you get over a few source files, even if you have well-organized code and interfaces, many changes can require that you inspect the data path. You can do this manually, or you can be assisted with tools. Tools like regular expressions, grep, your editor -- Spinellis shows you how to make use of all of them when you write code. A lot of tools I've never used (but have heard about) are featured, and their use is demonstrated, but of course many tools are simply ignored, focusing on popular ones that will work for most people.

Finally, all of the above is brought together in Chapter 11, "A Complete Example." A small tour of a large, complex piece of code is taken (34,000 lines of Java) as the author makes changes. It's unfortunately in Java, when so much of the book focused on C (why couldn't they have been consistent examples?), but it works. The example itself could have covered a few more things, such as a proper JUnit example, but overall I'm pleased with it.

Overall, Code Reading: The Open Source Perspective is ambitious and worthwhile, both as a complement to a bookshelf of study that includes The Practice of Programming and Design Patterns, and to someone who is growing tired of books on learning a language. At times it feels like the author promised more than he wound up delivering, but it serves as an introduction to a large number of topics. You wont learn a language, and you wont be able to get as much out of the book if you don't engage it with practice, but it's a useful book to get started on the road from being someone who knows a language or two to someone who is a developer, ready to contribute to a team and work on large projects. Never underestimate the skills required to be a good developer, because they go well beyond knowing how to use a language.

You can purchase Code Reading: The Open Source Perspective from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

12 of 464 comments (clear)

Min score:

Reason:

Sort:

QA by Anonymous Coward · 2005-03-08 10:35 · Score: 2, Informative

I'm doing QA on all work delivered by contractors that is on our standing offer list (yes, I work for the government... and yes, there are a few good people here).

First thing you notice is that they hardly ever read the statement of work before they start coding. The whole application becomes a hack.

Second thing is the structure of the code. Indentation? What a heck is that? Some try doing indentation by just using the tab key. That looks great for someone who has the tab size set to six... after 13 levels (extreme, but I've seen it), you can't see the freaking code. My advice to someone who's in charge for a codebase... Standardize indentation, as well as the tab setting.

Third thing you notice is that they've changed the structure of the database that you provided to them. Oh, great!.. There goes the lean structure, just so that the coder could perform a quick hack by duplicating the data throughout all the tables.

If you want code to be maintainable, do QA, and do it good!
Learning from decades-old code by shoppa · 2005-03-08 10:40 · Score: 4, Informative
I've had many opportunities to work with code that has evolved over several decades. There are two common patterns:
1. Project was originally a quick hack. It lives well past its prime, gets modded extensively to handle changes going on in the real world (new devices, competition, etc.). Abstractions are added where necessary. Some hacky ugliness lives at fringes, but after umpteen releases and way too much backwards compatibility customers are still buying it.
2. Project was a grandiose dream by an analyst. Before any functionality exists, everything is abstracted to the max. 10-inch thick binders full of API's are published before the product actually does anything. The abstractions usually turn out to be wrong, and after many years (and little functionality) either the abstractions get twisted around at great expense to reach functionality, or the project dies in heaving paroxysms.
. Reading the experienced coder's comments is always good. They know the history and want to pass on the lessons learned to whoever will look.
Re:On the topic by Jason+Ford · 2005-03-08 10:46 · Score: 2, Informative

Hmmm, it could be that agustindondo is a Spanish speaker. 'Funciones' is Spanish for, you guessed it, 'functions.' Oh, and 'fecha' is 'date' en español.

--
I did not become a vegetarian for my health, I did it for the health of the chickens. --Isaac Bashevis Singer
Comments need structure by yintercept · 2005-03-08 10:50 · Score: 3, Informative

It seems to me that documation needs a structure. Structures like records of revision and design documents are useful. Free form comments in the code turn into white noise. Since I am far more interested in what a piece of code does, I pretty much ignore all documentation in the code. It seems like the majority of documentation in code becomes obsolete as people modify the code. For example, if a person has a problematic part in their code, they add copious notes document. When the problem finally clarifies in their mind and the find the code, they end up leaving obsolete documentation that just confuses people at a later date.

Personally, I think a person should document the interface, maintain a record of revision. The code itself should only be documented when the code is doing something out of the ordinary.

So, when I am learning a language, I put a great deal of notes in the code. As I learn the typical flow of the programs, I write less documentation in the code. When I have a strong feel for the language and reading the code is more informative than reading notes, my notes all but disappear. I've noticed many other programmers seem to have the same tendency. They write less documentation in code and they learn the language.

Having supported a large number of applications. I greatly appreciate good design documention, but discard notes in the code.
my technique by SQLz · 2005-03-08 11:05 · Score: 2, Informative

I'm a scripter (php, python, perl, java etc) and I work with other scripters. I notice a big difference in code depending on the age of the programmer and what programming language they started with.

My technique is pretty much opposite from anyone elses where I worl, although, I don't know of this is a good thing or not.

I start off with making tons of functions. I go crazy with functions and usually put them into related objects/packages with static methods just for organization purposes....so I quickly know where to look when I want to edit the behavior. I always maintain the PerlDOC/PHPDoc/JavaDOC comments even when I first start out because my IDE reads and pops up nice little hints when I forget arguments but other than that I don't comment too much unless I'm doing something dicey. Once I've coded enough, I start to see the places where I can refactor and abstract a lot of the functions away and weed out any inconsistancies really quick.

Others tend to make one gigantic script with little or any functions and tons of global structures and arrays then refactor it to a more sane style later on.

Their moto is "first get it working, then get it working right". My reply is "that doesn't mean you throw good programming practice out the window.";
What I do as a programmer by Anthony+Boyd · 2005-03-08 11:11 · Score: 3, Informative
As a PHP/Perl/JavaScript/HTML developer, here are some of the things I think I do well:
- I use liberal comments in most of my projects. I always try to use /* and */ to make multiline comments that are easy to quickly add to without worrying about a lot of pretty formatting.
- I have recently started to play with PHPDoc to create self-documenting code.
- When creating any character(s) implying "open" I immediately create the "closed" character(s) too. For example, I type "if () { }" and then fill it in. As I write this, my LI tags are all already typed, I am just filling in text now.
- I use text editors with syntax highlighting, such as HTML-Kit -- no drag & drop GUIs.
- I use tools like WinMerge, Subversion (only a little, not so good at it) and ReplaceEm to maintain large codebases.
Where I fail at coding:
- I know OOP, but it isn't natural for me, so I'm still a procedural boy, even when Object-Oriented Programming might help.
- I have no idea what vectors and matrixes are.
- I'm self-taught, my schooling is as an English major. So I have absolutely no Computer Science education behind what I do. While I try to do well, my solution to a deep and complicated problem is going to be basic compared to a guy who spent years of his life learning algorithms.
--
My Greasemonkey scripts for Digg &
Nonsense. by dwheeler · 2005-03-08 11:30 · Score: 4, Informative

I'll use whatever the indentation style of a current project is, and carry on. If you like that style, then go ahead, use it. I'll go along, too, if you're the project lead.
But many "newbies" such as Kernighan, Ritchie, and Torvalds all highly recommend the One True Brace (OTB) style. It's the one used in the K&R's C book, among other things. In other words, some of the people MOST exerienced with C use this style.
There are serious advantages to the OTB style. In particular, it eliminates useless white space so that you can actually see more vertical text simultaneously -- even with big screens that's helpful.
If you want to use a different style, go ahead! If you're the lead, I'll gladly use your style. But in programs I lead, I'll continue to use OTB and expect others to follow suit. Oh, and I've been using C since 1985, so !newbie.

--
- David A. Wheeler (see my Secure Programming HOWTO)
some newbies by r00t · 2005-03-08 11:35 · Score: 4, Informative

authors of 1999 ISO C standard (section 6.8.3 example 3, etc.)
Linus Torvalds (Linux kernel)
W. Richard Stevens (Advanced Programming in the UNIX Environment and UNIX Network Programming)
Brian W. Kernighan (C Programming Language co-author)
Dennis Ritchie (C Programming Language co-author)
Gamma, Helm, Johnson, and Vlissides (Design Patterns)
Hmmm. You are a newbie. Kernighan and Ritchie wrote the book on C, literally, and the ISO C standard still uses their style. The opening brace is only by itself when starting a function body. In all other cases, it shares the line.
Re:Obligatory quote by lexluther · 2005-03-08 12:11 · Score: 2, Informative

This is a Knuth quote. "Programs are meant to be read by humans and only incidentally for computers to execute".
It's not wrong. by Chemisor · 2005-03-08 14:55 · Score: 2, Informative

> dealing with function pointers named StupidSuckingGlobalCallbackFunction

When you have functions named like that, don't write documentation for them. Rename them. It takes a lot less effort to write code so its meaning is obvious then to write documentation explaining why you didn't do that.
Re:language bias detected by Jerf · 2005-03-08 16:33 · Score: 2, Informative

I've been working in Perl and Python for years now, with large products running on very big scales, tens of thousands of users running mission-critical stuff, day in, day out, that sort of thing.

I will let you know when this is an actual problem, rather than stories to scare the kiddies with. So far, while I can clearly see the costs (which, if you have never left the 'safety' of variable-based typing languages behind, you can't), I'm yet to see the benefits on any but the most extreme jobs.

I've picked up quite a list of people I've made this promise to, but I don't bother actually keeping track, because it isn't going to happen. They keep promising doom, and I keep coding four or five times faster than they do, and that's just the linear speed-up, to say nothing of the improved re-factoring I get and the general benefits of flexibility. Fair trade, I suppose.
Re:Coding style by dutky · 2005-03-08 18:41 · Score: 4, Informative

AuMatar wrote
The problem with hard fast rules like that is they're frequently not right. Take a state machine for example. A simple one with 6 or 7 states will go over 100 lines, and will go over 4 nestings. Heck, you'll take one up with the loop and one with the switch alone.

If you are coding state machines with switch statements, then you don't know what you're doing.

The state transitions, transition functions and accepting states should be stored in tables (2-dimensional arrays) and the entire state machine is then coded in about a dozen lines:
int next_state[NSTATES][NTOKENS]={{...}...}; int state_trans[NSTATES][NTOKENS]={{...}...}; int accept[NSTATES]={...}; int retval[NSTATES]={...}; int (*trans_func[NFUNCS])(...)={...}; /* syntax may be wrong */ int state_machine(...) { int st = 0, tok = 0, err = 0; while(accept[st] == 0) { tok = get_token(); err = trans_func[state_trans[st][tok]](...); if(err) return err; st = next_state[st][tok]; } return retval[st]; }

All the real work is in the state tables and the transition functions referenced in the transfunc array. I've found it a hell of a lot easier to get the state tables right than to find all the mistakes in a giant, nested switch statement. In most circumstaces the state tables can be constructed from the state diagram by inspection. For anything big enough that you can't construct the state diagram by hand, there are automated tools.

I guess my point is- lines of code isn't the enemy, some things are complex and need many lines to do. Nesting isn't the enemy, some things require many loops/ifs. The enemy is a lack of clear separation of functionality and lack of clear abstraction between parts.

Lines of code or nesting depth may not be the enemy, but they're no ally either. It can be quite difficult to specify to a diverse team what is meant and expected by clear abstraction and separation of functionality, but almost anyone can wrap their brain around a LOC or nesting depth limit. If you don't want to make hard and fast rules, you can use these limits as warning signs during the code review.

On the average you can say something like: "good code shouldn't have functions longer than two pages (120 lines) and no more than 4 or 5 levels of nesting" and hand out indulgences where needed. At least you might get some of the lower functioning members of the team to think before they code a 5000-line monstrosity.