Abandoning Header Files?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Friday January 14, 2005 @09:56AM from the strange-compilation-practices dept.

garethw asks: "I'm working on a project where the lead developer, following a suggestion by our tool vendor, wants to get rid of the header files and directly #include source code. The language is a somewhat specialized language, but for all intents and purposes, you can assume it's Java or C. The conventional argument I recall for using header files, and incremental compilation, is that it's faster to use a makefile and conditionally build only those files that have changed. However, it turns out that the brute force of invoking the compiler once on the top-level does actually compile much faster. I feel that there is something about #include'ing source files directly, compiling only the top-level file, just doesn't 'feel' right and I'm at a loss to really give a solid argument as to why. Has anyone actually used this approach? Does anyone have any thoughts on any advantages or drawbacks?"

14 of 207 comments (clear)

Min score:

Reason:

Sort:

Need more info... by sfjoe · 2005-01-14 10:00 · Score: 4, Insightful

...following a suggestion by our tool vendor,...

How much money will your tool vendor make if you implement this suggestion and what, if any, product does she sell that neatly solves any problems this might bring up?

--
It's simple: I demand prosecution for torture.
1. Re:Need more info... by Tim+Browse · 2005-01-14 15:04 · Score: 4, Insightful
  
  If they're anything like some tool vendors I've come across, it's because they either don't have decent compilation perfomance, or don't support the features that would help, such as pre-compiled headers, etc.
  
  So rather than fixing the problem by investing in their product, they're telling their customers to use ugly hacks to get around the product's shortcomings, and hope they won't switch to another system (I suspect).
  
  I've certainly been on the receiving end of such tactics.
  
  The dead giveaway is when they start saying things like "pre-compiled headers wouldn't help you anyway" :-)
Not useful for C by david.given · 2005-01-14 10:00 · Score: 4, Informative

...or, to a lesser extent C++, because of the way C scoping works:
static global variables have scope within the module they're defined in. Which means that two static globals in different source files don't collide, because they're in different modules.
Including everything into one big source file will mean that they're both in the same module, and so will collide. Not good.
Can't say about other languages, though.
Interface vs implementation, shared libraries, etc by Dimwit · 2005-01-14 10:00 · Score: 3, Insightful

Well, there's the obvious separation of interface from definition. And the problem of duplicate definitions - there's a reason why "extern" is a keyword. :)

Plus, header files define an interface, which is useful if you don't actually have the code (i.e. binary shared library). Moot point in your case, I think, but...

Plus it's just good programming style to have separate definitions and implementations. Easier to track down bugs.

--
...but it's being eaten...by some...Linux or something...
Keep the header files by SunFan · 2005-01-14 10:06 · Score: 4, Insightful

They are just about the only way to centrally organize declarations for data structures and function signatures. Doing so will save your ass eventually, because having function prototypes available can allow the compiler and lint tools catch stupid programmer errors. You do use lint-like tools, right? They _will_ catch bugs that testers and visual scanning wont.

The only draw back to headers in C is that if you forget to 'make clean' after changing a header, you can end up with object files using old definitions. Just make a habit of doing a full build after changing the headers. If you designed your software properly, changing header files won't be all that common (adding functions new data structures, etc.).

--
-- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
Speed by jbrandon · 2005-01-14 10:14 · Score: 3, Insightful

Have you tested the speed difference when you change only one non-header file? I bet incremental compilation will make that quite a bit faster. In addition, if you want to compile that changed source file to check for syntax or type errors, you don't have to check for collision between it and the whole rest of the project, only collisions between it and the header defining it.
Why? by Pacifix · 2005-01-14 10:23 · Score: 4, Insightful

It seems that the onus should be on the vendor to explain very, very convincingly why you should abandon decades of standard practice and good coding practice. This better be one hell of a good product you're developing to justify the should a radical change. You shouldn't need to defend standard practice, they must campaign for a change to that practice. Imagine trying to explain this to all the coders who will work on the product for the next decade - will they think you're crazy or is there really a reason to do this?
1. Re:Why? by CamMac · 2005-01-14 10:33 · Score: 5, Funny
  
  Remeber, if you remove all the comments from the code, it will compile faster and the executable will be smaller.
  
  --Cam
  
  --
  All jocks think about is sports. All nerds think about is sex.
Several advantages and disadvantages by cookd · 2005-01-14 10:26 · Score: 3, Insightful
1. Advantages:
2. Faster compile of the full product. You only invoke the compiler process once, and much less work for the linker to do.
3. Much better optimization. Compilers can only optimize within a compilation unit. Intel and Microsoft have "Link-time code generation" compilers which performs a final optimization pass during link, but if you aren't using those compilers, there might be a significant amount of additional optimization enabled by putting everything in the same compilation unit.
1. Disadvantages:
2. You're not doing it the way everyone expects you to do it. Certain components (the compiler, the linker, and pre-existing code) might have been designed under the assumption that individual files would be compiled separately. The pre-existing code might have declared static (per-file) variables or functions in a way that could collide with other code (namespaces might help here). The compiler and linker might have limits. And you might not hit those limits until late in the project.
3. For building the whole product, yeah, it will be faster. But for making a small change and rebuilding the results of that change, it might be much slower.
As with every issue you'll ever run into, there are two (or three) sides to it.
--
Time flies like an arrow. Fruit flies like a banana.
1. Re:Several advantages and disadvantages by stonecypher · 2005-01-14 21:22 · Score: 4, Informative
  
  1. Faster compile of the full product.
  
  Well, back in the real world, in a properly decoupled project incremental linking is a massive speed win, even when building from the top, as there's far less cross-lexing and as the build tables may be handled a small piece at a time, which is important because their parsing in the compiler itself is generally of O(n^2 log n) time or better. Once you've worked on a large project which fails to make proper decouplings, you will become painfully aware of this trend.
  
  Whereas in this particular project the complete build is apparently faster, that is almost certainly the result of a very naive code tree and/or build scheme; the importance of incremental linking towards speed of compile cannot be overestimated, even in the case of compiling from clean.
  
  2. Much better optimization. Compilers can only optimize within a compilation unit.
  
  This simply isn't true. Whereas only some compilers make cross-TU optimizations, that is not the same as cross-TU optimizations being only able to optimize within a translation unit (why do people keep saying compilation unit? There's no such thing!) Besides, you're dramatically underestimating the commonality of link-time cross-tu counterspecialization, which now exists in ICC, BCC, MSCC, ARM ADS, EDG/Comeau, GHOC, and is in experimental development within GCC.
  
  You're not doing it the way everyone expects you to do it. Certain components (the compiler, the linker, and pre-existing code) might have been designed under the assumption that individual files would be compiled separately.
  
  They most certainly have not been. The C and C++ standards do not allow for such ridiculously inappropriate behavior. Where did you get this idea? Compiler writers may not impose arbitrary restrictions on the codebase in any relation to the local filesystem. This is just untrue.
  
  The pre-existing code might have declared static (per-file) variables or functions in a way that could collide with other code (namespaces might help here).
  
  This is a well known gigantic red flag indicating an amateur programmer. File-scoped variables are antiquated even within the pure C community; the only time they're acceptable in most professional programmer's eyes are within a library which is built alone. In fact, you might want to read the things Kernighan himself said about when file-scoped variables are appropriate in K&R 2; the primary author of the language himself says that this is a fundamentally bad technique and should not be done.
  
  Of course, that you're causing problems by misusing the toolchain and allowing bad code to collide when build trees written seperately are blindly merged without the help of a linker is just not surprising.
  
  The compiler and linker might have limits.
  
  Not if they're standards compliant, they mightn't. Did you know that there's a document out there floating around telling compiler authors in concrete detail what they may and may not do? You should read that before commenting on what a compiler may or may not do; you are simply out in left field, here.
  
  As with every issue you'll ever run into, there are two (or three) sides to it.
  
  Not when you know what you're talking about. Whereas many things are issues of pro/con, many simply aren't; you'll be hard pressed to find pros in the distribution of heavy ordinance to delusional sociopaths, you'll be hard pressed to find pros in setting up a "bring a molester to school day," and you'll be hard pressed to find pros in non-decoupled code, once you've actually read the standard and are aware of the real limitations of compiler authors, instead of your guesses about what might maybe happen if someone wasn't paying attention.
  
  --
  StoneCypher is Full of BS
ccache by yamla · 2005-01-14 10:57 · Score: 3, Informative

It is hard to tell from your statements, but this may stop tools like ccache from working. I use ccache in my projects and it radically cuts down the amount of recompilation required when I do a complete rebuild. Now, an obvious question is why I don't simply rely on makefiles to ensure only changed files ever get rebuilt. This often happens because compilation involves generating new cpp files that are then compiled and I don't want to be grepping through these all the time. I suppose I could move them all to a different directory, but ccache works very well.

The other problem, of course, is that separating your classes into header and implementation means that if you change the implementation, you only need to recompile that one file and relink, rather than recompiling EVERYTHING. This can be a matter of a few seconds vs. several minutes. And implementation does change, a lot... fix a bug, you fix the implementation. The headers change too, but much much less frequently.

--

Oceania has always been at war with Eastasia.
The immortal advice of Rocket J Squirrel by crmartin · 2005-01-14 11:35 · Score: 5, Insightful
"Oh, Bullwinkle, that trick never works."
One of the really depressing things about having been in the business for nigh on to 40 years now is that, along with the occasional new dumb idea, all the old dumb ideas keep coming back. Among those dumb ideas that keep coming back are "visual programming" --- using graphics instead of programming languages; complicated schematic graphics for software --- UML in its utter complex form; and, sure enough, using the preprocessor to mess with C-like languages.
Every time this is tried --- and God knows it's been tried a lot --- you run into some severe problems:
1. The scoping rules of C-like languages give semantics to file inclusion. If you #include chunks of code, you are defeating the language's (limited) ability to protect you from name space clashes, mis-named variables, and so on.
2. While it might be that you gain something from only needing to start the compiler once, parsing and compilation are inherently a bit harder than O(n) where n is the number of source characters or tokens. A normal environment with make(1) will generally need to process fewer tokens than compiling everything all the time; the time required for a big file will inevitably dominate the startup time eventually.
  If you've got control of the compiler for this peculiar language, why not explore making the startup time shorter, say, eg., by using shared libraries, DLLs, or by setting the sticky bit?
3. From sad experience, I can tell you that using the #include scheme will introduce weird-ass order dependencies into the code (ie., what order do you include files in?) that are very very difficult to debug.
4. Most tools for C-based languages expect you to do the sources in a normal fashion. You confound the tools' expectations' at your peril.
5. Similarly, most debuggers exploit, or attempt to exploit, scoping rules that you will break through this approach.
6. When you write lots of smaller modules, each one can be create a single, small TEXT and DATA section, or a collection of small code sections. This makes the job of memory mapping in virtual memory systems much easier. Do it all as one big thing, and you're liable to get one big TEXT section.
7. Optimization is comnbinatorially fairly hard, quadratic or worse, and global optimizations tend to be managed within section bounardies. One-big-module designs may either make the optimization phase very lengthy, or defeat optimization entirely when table space etc. runs out.
8. You piss off every experienced C programmer who ever has to deal with the code in the future, especially old farts like me who've seen this trick 20 years ago.
You are solving the wrong problem by Chemisor · 2005-01-14 12:50 · Score: 3, Informative

Speeding up a full build should not be important. The only people who care about it are in your test lab doing daily builds and regression tests, who can start the build overnight and have it ready by morning. Of course, this is the situation in a well-designed application. If you find yourself needing a full rebuild all the time, it means one of two things: 1. you are hacking a core component, or 2. all your components are written with spaghetti code and any change in one forces rebuilds in all the others.

In the first case, try just testing one or two components during development, and then verify all the others when the API is stabilized. This is, incidentally, the advantage you gain from using header files: once the API is stable, you never need to rebuild that component again except to fix bugs (which require rebuilding only that component).

In the second case, you need some serious refactoring. Look at the code and break it up. Encapsulate everything you possibly can. Make stuff private and static. Make everything you don't modify const. Keep it up until each component is accessed only through its API and that API is clean. Trust me, this is possible in any project. The enormous decrease in maintenance costs will more than pay for any time you spend on it.
Total red herring... by pla · 2005-01-15 01:41 · Score: 3, Informative

First of all, "speed", either compilation-wise or runtime-wise, has nothing to do with why you should use header files.

I too disliked header files, long ago, in my early days of programming C. It seemed pointless, to have two files (or rarely, as many as four), when one would do just as well.

For small projects, I'll still use one large monolithic source file. In that aspect, it makes sense to skip breaking out your data and function definitions.

But when you get to the "real" world... Imagine even a "small" serious project, with perhaps 10k lines of code. Try to find a single function in that file - I hope you feel on good terms with your IDE's search capabilities!

So, break that out into a dozen files - You have your network code in one file, your UI code in another, your file I/O in another, perhaps some database interaction in another, and so on. Okay, that works well... But wait, your network code, your file I/O, and your database code, all make use of the same checksum algorithm! So, you have the same exact code duplicated three times.

That would work, because each file will compile to a module with its own namespace (in most languages). But it wastes space, both in the source and in the compiled code. It also wastes time and can very easily introduce bugs - For example, if you decide you need to switch from MD5 for SHA1 as your checksumming algorithm, you now need to change three places instead of one. If you miss one of those, but use them to compare results between the three different uses, you have a very serious bug that may drive you batty trying to track it down.

So, the obvious solution, break out all your common functions into a toolkit-like source file. Now, you could just #include that in every other file that needs it, but WOW would that cause some serious bloat in the compiled code - In my experience, shared code files frequently end up as the single largest source file in the entire project.

So, use a header file. That way, you don't end up with massive duplication of code, you have the advantage of a logical breakout of your code into similar-purpose files, and you can still make changes to only one file to modify one function.

Incidentally, the above chain of thinking more-or-less describes the evolution of standard libraries... Would your professor actually suggest that you shouldn't "#include<stdio.h>", but instead should manually pull the code for each function you use into your source file? Because, in the degenerative case, he has told you exactly that.