Best and Worst Coding Standards?
An anonymous reader writes "If you've been hired by a serious software development house, chances are one of your early familiarization tasks was to read company guidelines on coding standards and practices. You've probably been given some basic guidelines, such as gotos being off limits except in specific circumstances, or that code should be indented with tabs rather than spaces, or vice versa. Perhaps you've had some more exotic or less intuitive practices as well; maybe continue or multiple return statements were off-limits. What standards have you found worked well in practice, increasing code readability and maintainability? Which only looked good on paper?"
I've always found the Joint Strike Fighter's coding standards document an interesting read. It is available from Bjarne Stroustrup's website (pdf)
First off, I'd suggest printing out a copy of the /. comments, and NOT read it. Burn them, it's a great symbolic gesture.
If Kernighan, Ritchie, and Torvalds does it like that, who am I to do differently.
Je ne parle pas francais.
It doesn't really matter what you do, so long as everyone on the team does the same thing.
If you are using your computer right, it does not only enable you to do things, it does the boring things for you, automatically.
Checkstyle is one of the tools in a company toolkit that is often overlooked but in fact VERY handy. It enables you to define a ruleset for your source code, finding stuff which is incompatible with the coding practice in your company/team/project/whatever. Moreover, you can stick it into Eclipse using the free Eclipse-CS plugin, so it will automagically mark the places which need to be change. Last but not least, you can put Checkstyle as an Ant task in your building environment (and in your continous integration toolkit) so commited code that does not conform certain standards does not build.
As for the rules themselves, we've found these to be the most successful:
Of course, we let developers to add suppresions for the 1% of false positives. In fact, there are very few suppresion rules set.
Build a tool even an idiot can use and only an idiot will want to use it. -S.O.B.
Make it "cut and paste" friendly, and as small as possible.
That's a really bad idea. Cut and paste causes code cloning, which is among the most difficult maintenance problems.
Code should be designed, when possible, in small chunks (methods, functions, etc.). This keeps the need to think about refactoring to a minimum, since the code is already factored. Well factored code has many other benefits, including easier-to-write unit tests and better understandability.
I maintain software that was originally written by someone as a prototype and eventually given production status. 4 years later, I am still pulling bugs out that relate to code cloning. Think of the guys who will maintain your software, please.
All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
Why don't you just give them all giant Q-tips and play the Star Trek fight music every time they meet?
Surely that would be at least as productive as asking them to all agree on coding standards.
Well, as long as we're admitting that "readable" is an entirely subjective experience.. I'd have to say that I would find that notation less intuitive than the "} else {" construct.
It's too similar to consecutive 'if' statements which of course, breaks the logic.
Also, extending your notation logic fully results in:
if ( condition )
{
statement1;
}
else
{
statement2;
}
Which, although a waste of lines, is less confusing than your example.
True - but at least it keeps thousands of otherwise dangerous PHP developers safely occupied.
There are other good reasons for putting open braces on their own line. The biggest is that most coding conventions have a maximum line width. If you have an 81-character line, you need to break it. When you are scanning down the code, all you see is a line at one indent level followed by another line indented more - you need to read the entire line to tell whether it's the start of a block or not. With braces on their own lines you can tell just by visual pattern matching where every block starts and finishes.
While I'm in holy-war territory, I'll also chime in on the tabs versus spaces argument. The tab character has a well-defined semantic meaning. It means 'indent this line / paragraph by one tabulator.' If you are indenting anything there is only one character you should be using - tab. It does not, however, have a fixed width, and should therefore never be used anywhere other than the start of a line or for aligning two lines. If you have to split a function across two lines, you should indent it like this:
Then, no matter whether the person reading your code thinks tabs should be 1 or 8 characters wide, arg1 and arg2 will always line up. Sadly, vim does not have the ability to distinguish marks used for indenting and marks used for alignment and so this has to be done manually.
I am TheRaven on Soylent News
I worked for a company that was destroyed by a bad coding standard.
This was a small company, that, back in '96, was awarded the contract for a POS application for a regional store chain, with back-office servers that would be updated nightly by modem.
The guys who ran the company weren't programmers (though one of them knew enough to be dangerous); they were technical salesmen. They were also big fans of Microsoft, with "MVP" plaques on the walls, and every employee except me having Microsoft certs.
I worked for them part-time while also working for another company. I advocated Unix (mostly BSDI and SunOS at the time), and always argued with them about why Unix was better (technical superiority vs. potential for big profits).
When their big project was well underway, they brought me in to do the communications part of it, where the POS terminals would contact one of several servers by modem each night ("why not just ethernet them together, get a dialup PPP connection, and use IP? the interface is so much more reliable..." Request denied).
The app was Visual Basic, with third-party "custom controls" for things like talking to modems. My part went fairly smoothly, and I was eventually asked to help out with the main application, which was suffering from unexplained crashes. When I looked at the code, I found something... strange.
For error handling, they had elected to use a program called "VB Rig" (the name came from the rigging used on sailing ships, which prevents a sailor from falling to his death. Sometimes.) What this program did was to examine the source code, and then add error handling boilerplate at the start and end of each and every function. It inserted the exact same error handling code into every function.
Because the error handler had to be all purpose, it was about 20 lines of code per function - sometimes much larger than the regular part of the function. And, worse, because it was the same for every function, and it made use of the same variable names, that meant either every variable had to be global, or you'd have to declare the ten or so standard variable names at the start of every function (they opted for the "everything is global" approach).
Which led to things like this (forgive the syntax errors, it's been years since I've touched VB):
On Error goto my_data_file_read_function_VBRIG_TRAP
open MyDataFile for writing ...
goto my_data_file_read_function_VBRIG_CLEANUP
my_data_file_read_function_VBRIG_TRAP:
on error 101 'Permission Denied
delete MyDataFile
resume
on error 102 'File Not Found
MessageBox 'Cannot read ' + MyConfigFile
resume
my_data_file_read_function_VBRIG_CLEANUP:
blah blah
my_data_file_read_function = SUCCESS ' return
As you see, the error handling code - which had to be exactly the same for every function - made use of global variables (names like DataFile1, MyFile1, UserName, etc.) to figure out what to do for each error. That meant, that if there was any possibility you might have a "File Not Found", you had to expect the filename where that might happen to be in a particular global variable - say, MyFile1 - and hope that the calling function wasn't using that name too, for the same reasons.
Naturally, files were being created and deleted at random, and the programmers often spent hours on the phone with the customer trying to figure out why the Access database had disappeared *again*.
I asked if we could just write the error handling by hand, and use appropriate local variables; or take the standard VBRig error handling and trim out the lines that weren't relevant for a particular function (as subsequent VBRig runs wouldn't touch its code region if it saw that it had been customized).
Request Denied. "This is our coding standard. We carefully reviewed the options before making the decision to use t
.
Coding guidelines are typically justified because, as it goes, most of the time is spent fixing bugs in existing code than writing new code. The guidelines are needed because it helps others to come up to speed quickly while they try to figure out the code in which they have to fix the bug(s).
I think that is the wrong focus, as it tends to reinforce incorrect behavior, i.e., the writing of buggy code.
Coding guidelines should focus instead on the techniques that help reduce the number of bugs in code. How is that done? It takes someone (typically a senior person) looking at the the bugs that have been found in the code, categorizing their cause, devising a way to prevent those bugs from occurring, then putting that into the guidelines.
Keep the focus of the guidelines where it should be: to increase the quality of the software.
Make it "cut and paste" friendly, and as small as possible.
Cut and paste causes code cloning, which is among the most difficult maintenance problems. Code should be designed, when possible, in small chunks (methods, functions, etc.).
Wait.. are you trying to say that copying the same lines of code over and over again must be avoided? So tell me genius, how else would you implement such a function without copying?
You just got troll'd!
Ding ding ding - we have a winner.
Real coders write code that you can take a ruler from any given close brace and draw a vertical line right up to the matching open brace, every time. Everybody else gets fired.
Lines are cheap. Time added trying to figure out an obfuscated code structure because somebody wanted to save lines (ie, put the open brace on the same line instead of doing the above) is expensive.
Glonoinha the MebiByte Slayer
Bollocks.
Draw your line from the closing brace up to the first line with any text on it, that line is the start of your block.
Having your opening braces on an empty line might be more aesthetically pleasing but has zero advantage in making the code clearer.
Either way, the most important thing is to have everyone do it the same way, every time.
I wish to remain anomalous
Now that we're talking about 'languages that invite bad coding practices'... Well, one of the best programming books I've read is 'Perl Best Practices'. Not only does it list out best practices but it tries to explain (well I might add) why you should code a certain way and why other ways aren't good to follow.
One of the habits I picked up from 'Perl Best Practices is:
instead of:
The else tends to get 'lost' when just following the closing bracket.
Duh, you so need to learn about this little thing called structured programming, which can totally help cut down on code duplication like that crap.
Here's a hint:
See? Much easier to understand than your spaghetti code, and much more maintainable too.
The Linus says:
"If you need more than 3 levels of indentation, you're screwed anyway, and should fix your program."
from http://en.wikiquote.org/wiki/Linus_Torvalds
I've seen a number of private sector firms with coding standards. Some are just a few pages of common sense rules (naming conventions, etc) while others were book-length horrors created by people so incompetent that management didn't trust them to write code. I've seen requirements forbidding constants in code (correct practice was #define THIRTY_SEVEN 37, believe it or not,) and crazy Hungarian style naming conventions (nothing like several characters of line noise at the end of every function name.
My current firm's approach is pretty simple:
1. Write clear, understandable code
2. Make it look like all the other code in our system
3. Use the standard IDE
4. The entire codebase is visible to all developers
5. If you code does not conform, an annotated screenshot of it will be posted to our main developer chatroom
6. People will then discuss your code publicly
7. If the code is truly awful, a senior developer will declare it unacceptable, and delete it from the system
From a personal perspective that happens to tie in with the coding practices at my last company:
The second example (GNU style) I have found to be quite cumbersome in writing, unless tabs are set to 2 with braces indented once and content twice (company mandated four with one indent for content in the block), in which case I would be frustrated with the extra keypresses involved.
The first example (Allman style) I used to use until I moved over to Kernighan-Ritchie style (opening brace on same line as control statement, with functions (and classes in OOP languages) braces the exception; these are written in Allman style). This allows me to scrunch more onto the screen vertically.
FWIW I never liked the '} else {' style of elses but at the same time, I never found it difficult to read so it was never a real issue. It makes sense to me to have the else begin at the same column as the if to which it belongs.
This may be of interest to you.
"Three eyes are better than one" -- Lieutenant Columbo
Strangely enough, Hungarian worked quite well for the problem it was originally intended to solve.
I worked at Xerox in the late 70's and my manager was Charles Simonyi, inventor of this notation. The project was BravoX (grandparent of MS Word) and was written in BCPL. BCPL basically has one type: integer. How that integer is treated is purely a function of how you reference it. E.g. fooFirst>>fooNext means "use the variable 'fooFirst' as a pointer to a structure of type FOO, one of whose elements is (from the naming convention) a pointer to some other FOO." Whereas fooFirst+1 adds one to an integer and (almost certainly) yields an invalid point that bill blow up when you try to use it. (It's been 30 years since I wrote anything in it, so I probably screwed up the example.)
Since there was only one type, the compiler didn't/couldn't perform type checking. Hungarian was a way of putting the type into the name of the variable so that the programmer could perform visual type checking. There were 9 of us on the project and the consistency/readability across the code base was impressive. Any of us could go into anyone else's code and almost immediately see what was going on.
I still use a light variant of it in my own code, but when in someone else's code I try to stick to their naming/formatting convention.
Like so many good ideas, it worked well in its original context but became twisted out of shape when used for something never intended/envisioned by the original developers (even though the person doing the twisting was, in fact, the original developer!). Another example of this is the Third Eye Software symbol table format I created for my debugger, CDB, but which was then used and abused by Mips to create a complete piece of crap. What they did still has people swearing at me 20+ years after the fact. (More on this at Third Eye Software and the MIPS symbol table)
YOU FORTH LOVE IF HONK THEN
Bollocks. It's a tradeoff just like every other debate in the programming world. Sure, Perl gives you the ability to put way to much code on a single line. But the opposite problem of putting loads of white space all over the place is almost as bad.
The more you spread out the code, the more you have to scroll. White space is valuable when it means something, like to separate discrete tasks within a long function. But the whole
}
else {
thing is just a waste of space. It's one line less of code I can see. I visually parse } else { instantaneously. Similarly, some compound expressions or chained method calls make perfect sense. The right place to break out multiple lines depends on the reader's own cognitive abilities and familiarity with the symbols being manipulated.
Otherwise
writing
like
this
would
be
much
easier
to
read