Abandoning Header Files?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Friday January 14, 2005 @09:56AM from the strange-compilation-practices dept.

garethw asks: "I'm working on a project where the lead developer, following a suggestion by our tool vendor, wants to get rid of the header files and directly #include source code. The language is a somewhat specialized language, but for all intents and purposes, you can assume it's Java or C. The conventional argument I recall for using header files, and incremental compilation, is that it's faster to use a makefile and conditionally build only those files that have changed. However, it turns out that the brute force of invoking the compiler once on the top-level does actually compile much faster. I feel that there is something about #include'ing source files directly, compiling only the top-level file, just doesn't 'feel' right and I'm at a loss to really give a solid argument as to why. Has anyone actually used this approach? Does anyone have any thoughts on any advantages or drawbacks?"

13 of 207 comments (clear)

Min score:

Reason:

Sort:

Not useful for C by david.given · 2005-01-14 10:00 · Score: 4, Informative

...or, to a lesser extent C++, because of the way C scoping works:
static global variables have scope within the module they're defined in. Which means that two static globals in different source files don't collide, because they're in different modules.
Including everything into one big source file will mean that they're both in the same module, and so will collide. Not good.
Can't say about other languages, though.
1. Re:Not useful for C by angel'o'sphere · 2005-01-14 13:34 · Score: 2, Informative
  
  Lol,
  
  reread your parrent!!
  
  Exactly what you show is what he says. But he was talking about *.c Files, not *.h files. So while the *.c files would scope the foo variables leading to two distinct ones the +. h file pulls them both into the same c file.
  
  So what in the beginning worked, while it was scoped, does no longer work if everything is pulled int one single source file via #include.
  
  So your example exactly shows the conflict your parent wanted to point out.
  
  angel'o'sphere
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
2. Re:Not useful for C by Bloater · 2005-01-16 02:43 · Score: 2, Informative
  
  The term used in C is the "Translation Unit". When you compile a .c file you are compiling a translation unit. If the C source file #includes the contents of another file, then those contents replace the #include line in what the compiler considers to be the code to be translated.
  
  It doesn't matter what the file name is from the point of view of C, but a given compiler may use the last dot of the filename and the characters after it to determine which language it is, and whether it is a source file to be compiled or an object file to be linked.
Depends on the size of the project by nadador · 2005-01-14 10:31 · Score: 2, Informative

Depending on the size of your project, you will get varying returns from each of these:

1. Seperate source files means that units of code can hide data and functions.
2. Seperate headers, combined with something like GCC's -Wmissing-prototypes enforces the good coding practice of well defined functional interfaces.
3. Seperate headers and source files means that when you look at a function in a file, you will have some idea of what it touches because you can go and look that it included header X but not Y.
4. You can tell the compiler to explicitly forbid global data symbols, which is pointless in one single file.
5. You can use different compiler switches for different files.
6. Your code will have some hope of portability.

If your project is small, it doesn't matter anyway. If your project is large, you can get your compiler to enforce some good design rules on you, which doesn't mean you can't still have a good design anyway, but it will make it more likely. I worked on a project that used a compiler that let you get away with everything. Try and port that code to anything UNIX-like, and it was ridiculous.

--

Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
ccache by yamla · 2005-01-14 10:57 · Score: 3, Informative

It is hard to tell from your statements, but this may stop tools like ccache from working. I use ccache in my projects and it radically cuts down the amount of recompilation required when I do a complete rebuild. Now, an obvious question is why I don't simply rely on makefiles to ensure only changed files ever get rebuilt. This often happens because compilation involves generating new cpp files that are then compiled and I don't want to be grepping through these all the time. I suppose I could move them all to a different directory, but ccache works very well.

The other problem, of course, is that separating your classes into header and implementation means that if you change the implementation, you only need to recompile that one file and relink, rather than recompiling EVERYTHING. This can be a matter of a few seconds vs. several minutes. And implementation does change, a lot... fix a bug, you fix the implementation. The headers change too, but much much less frequently.

--

Oceania has always been at war with Eastasia.
Re:Why? by crmartin · 2005-01-14 11:37 · Score: 1, Informative

Someone should mod this either "funny" or "dumb".

(In real compiled languages, the comments are stripped out in lexing or earlier. In C, they're stripped by the preprocessor.)
You are solving the wrong problem by Chemisor · 2005-01-14 12:50 · Score: 3, Informative

Speeding up a full build should not be important. The only people who care about it are in your test lab doing daily builds and regression tests, who can start the build overnight and have it ready by morning. Of course, this is the situation in a well-designed application. If you find yourself needing a full rebuild all the time, it means one of two things: 1. you are hacking a core component, or 2. all your components are written with spaghetti code and any change in one forces rebuilds in all the others.

In the first case, try just testing one or two components during development, and then verify all the others when the API is stabilized. This is, incidentally, the advantage you gain from using header files: once the API is stable, you never need to rebuild that component again except to fix bugs (which require rebuilding only that component).

In the second case, you need some serious refactoring. Look at the code and break it up. Encapsulate everything you possibly can. Make stuff private and static. Make everything you don't modify const. Keep it up until each component is accessed only through its API and that API is clean. Trust me, this is possible in any project. The enormous decrease in maintenance costs will more than pay for any time you spend on it.
Re:GCC by Jamie+Lokier · 2005-01-14 15:13 · Score: 2, Informative

That's not an error, and if your OS prof said it was an error that was not picked up, he/she is mistaken.

The C language definition is clear that you can write the program you did, with a variable defined in "common" form (no initialiser and no extern) in both files, and a function called without a prototype, and it is a valid C program with well defined behaviour.

-- Jamie
Re:Several advantages and disadvantages by stonecypher · 2005-01-14 21:22 · Score: 4, Informative

1. Faster compile of the full product.

Well, back in the real world, in a properly decoupled project incremental linking is a massive speed win, even when building from the top, as there's far less cross-lexing and as the build tables may be handled a small piece at a time, which is important because their parsing in the compiler itself is generally of O(n^2 log n) time or better. Once you've worked on a large project which fails to make proper decouplings, you will become painfully aware of this trend.

Whereas in this particular project the complete build is apparently faster, that is almost certainly the result of a very naive code tree and/or build scheme; the importance of incremental linking towards speed of compile cannot be overestimated, even in the case of compiling from clean.

2. Much better optimization. Compilers can only optimize within a compilation unit.

This simply isn't true. Whereas only some compilers make cross-TU optimizations, that is not the same as cross-TU optimizations being only able to optimize within a translation unit (why do people keep saying compilation unit? There's no such thing!) Besides, you're dramatically underestimating the commonality of link-time cross-tu counterspecialization, which now exists in ICC, BCC, MSCC, ARM ADS, EDG/Comeau, GHOC, and is in experimental development within GCC.

You're not doing it the way everyone expects you to do it. Certain components (the compiler, the linker, and pre-existing code) might have been designed under the assumption that individual files would be compiled separately.

They most certainly have not been. The C and C++ standards do not allow for such ridiculously inappropriate behavior. Where did you get this idea? Compiler writers may not impose arbitrary restrictions on the codebase in any relation to the local filesystem. This is just untrue.

The pre-existing code might have declared static (per-file) variables or functions in a way that could collide with other code (namespaces might help here).

This is a well known gigantic red flag indicating an amateur programmer. File-scoped variables are antiquated even within the pure C community; the only time they're acceptable in most professional programmer's eyes are within a library which is built alone. In fact, you might want to read the things Kernighan himself said about when file-scoped variables are appropriate in K&R 2; the primary author of the language himself says that this is a fundamentally bad technique and should not be done.

Of course, that you're causing problems by misusing the toolchain and allowing bad code to collide when build trees written seperately are blindly merged without the help of a linker is just not surprising.

The compiler and linker might have limits.

Not if they're standards compliant, they mightn't. Did you know that there's a document out there floating around telling compiler authors in concrete detail what they may and may not do? You should read that before commenting on what a compiler may or may not do; you are simply out in left field, here.

As with every issue you'll ever run into, there are two (or three) sides to it.

Not when you know what you're talking about. Whereas many things are issues of pro/con, many simply aren't; you'll be hard pressed to find pros in the distribution of heavy ordinance to delusional sociopaths, you'll be hard pressed to find pros in setting up a "bring a molester to school day," and you'll be hard pressed to find pros in non-decoupled code, once you've actually read the standard and are aware of the real limitations of compiler authors, instead of your guesses about what might maybe happen if someone wasn't paying attention.

--
StoneCypher is Full of BS
Total red herring... by pla · 2005-01-15 01:41 · Score: 3, Informative

First of all, "speed", either compilation-wise or runtime-wise, has nothing to do with why you should use header files.

I too disliked header files, long ago, in my early days of programming C. It seemed pointless, to have two files (or rarely, as many as four), when one would do just as well.

For small projects, I'll still use one large monolithic source file. In that aspect, it makes sense to skip breaking out your data and function definitions.

But when you get to the "real" world... Imagine even a "small" serious project, with perhaps 10k lines of code. Try to find a single function in that file - I hope you feel on good terms with your IDE's search capabilities!

So, break that out into a dozen files - You have your network code in one file, your UI code in another, your file I/O in another, perhaps some database interaction in another, and so on. Okay, that works well... But wait, your network code, your file I/O, and your database code, all make use of the same checksum algorithm! So, you have the same exact code duplicated three times.

That would work, because each file will compile to a module with its own namespace (in most languages). But it wastes space, both in the source and in the compiled code. It also wastes time and can very easily introduce bugs - For example, if you decide you need to switch from MD5 for SHA1 as your checksumming algorithm, you now need to change three places instead of one. If you miss one of those, but use them to compare results between the three different uses, you have a very serious bug that may drive you batty trying to track it down.

So, the obvious solution, break out all your common functions into a toolkit-like source file. Now, you could just #include that in every other file that needs it, but WOW would that cause some serious bloat in the compiled code - In my experience, shared code files frequently end up as the single largest source file in the entire project.

So, use a header file. That way, you don't end up with massive duplication of code, you have the advantage of a logical breakout of your code into similar-purpose files, and you can still make changes to only one file to modify one function.

Incidentally, the above chain of thinking more-or-less describes the evolution of standard libraries... Would your professor actually suggest that you shouldn't "#include<stdio.h>", but instead should manually pull the code for each function you use into your source file? Because, in the degenerative case, he has told you exactly that.
Re:Time you gain, you loose in debugging by humblecoder · 2005-01-15 02:07 · Score: 1, Informative

Right on: preach it, brother. This is one of the least understood principles of modern design: machine time is significantly inferior to programmer time. Herb Brooks would be proud.

For those who don't know, Herb Brooks is the lesser known brother of famed author Fred Brooks. Herb is best known for writing the obsure tome, _The Legendary Monkey-Hour_. LMH is not as well known as older brother Fred's _Mythical Man Month_, but among primate coders, it is the bible.

Herb is also the President of the Billy Einstein Appreciation Society, a group dedicated to studying the work of the oft-forgotten sibling of Albert Einstein, who penned the Theory of Absolutitivity.

--
------
www.moneybythenumbers.com
More from the author by garethw · 2005-01-15 05:43 · Score: 2, Informative
Thanks for all the interesting replies. It's always nice to start a flame war.

I wish I'd included a few more details, which might have avoided questions like, "Are you stupid?" and "Have you taken basic Computer Science course?" (the answers are "On occasion" and "Waterloo, Comp Eng '98" respectively :) )

A few details which might put the question into perspective might be:
- The project is a chip verification project. There is no final "product" at the end of my work. The name of the game is endlessly re-compiling and running new tests. So compile time is actually quite significant.
- There is no linker. :) The nature of the language is such that it is linked at run time.
- The compiler actually doesn't allow you to list multiple source files on the command line and produce one object. So I guess my C/Java analogy was misleading. But that's partly why I'm at a loss to rationalize the question - there is little direct reference point.
- A lot of people missed my point - I think abandoning header files is abhorrent. But when it came down to it, I couldn't actually produce any inarguable reasons why (namespace is one, but I don't think it's a show-stopper).
Thanks again for your insights.
--
garethw
Re:Incremental compilation by stonecypher · 2005-01-15 12:58 · Score: 2, Informative

I had a whole reply ready, but IMHO it is not worth the trouble replying to.

"Oh, I wrote a reply, but I don't want to paste it because you're not a good person and I don't want to." My eight year old son knows that nobody falls for this sort of passive agressive dismissal; it's disappointing that you do not.

Namedropping doesn't make you seem correct, y'know.

Neither does getting personal about perceived faults, when it's pretty much your own assumptions that are the problem.

Observing that something you've done isn't effective is hardly my getting personal. Believe me, there's no shortage of material; what I said above about my son is, for example, personal, as is the following: turn down the whine knob until you've got something worth saying to say.

So the whole template rant is bogus.

Given that it was not you but the original poster which set the domain of important languages, and given that I also touch on pure-C and pure-Java issues, this protest is as bogus as it pretends what it's attacking to be.

- When I am talking about state, I'm talking about state in the compiler. (which in most _performing_ tools has the preprocessor built in btw)

Yes, I heard you the first time. The reason I referred you to modern c++ design is that you're wrong, and I have neither the patience nor the kindness to explain it to you. Start with section 3.5, or with any page explaining how C++ template metaprogramming is a functional language rather than an imperative language. Before you fly off the handle talking about how you weren't referring to templates *again*, please realize that the observations regarding template MP as a functional language in fact apply to everytihng in the C and C++ preprocessors.

Do not reply until you have read; repeating ignorance is no more argument than repeating falsehoods.

-- by {$i} ({$include in delphi), like #include in C

Actually, you're shooting yourself in the foot here. You're attempting to make the hasty generalization that there are two approaches to bringing outside code into a local place, that C/C++ advocate an "inline header system" (whatever the hell that is) and that Delphi does something different.

What you seem to fail to understand is that uses is a call to the Delphi linker; it is literally the same thing as the linker in C++, and in fact if you take the time to look at borland's BPIL, you'll find that they generate the exact same intermediate language binary. Furthermore, the very same examples you give, {$i} and {{$include}}, are the same as #include. Furthermore, both languages offer still other mechanisms to bring code in or to generate code.

That said, talking about Pascal's differences with C and then discussing what Delphi does is roughly equivalent to describing what Objective C does. Delphi is not pascal any more than Objective C is C. They are distinct languages. Delphi is Borland's third pascal variant, Object Pascal, which follows both Borland Pascal and Token Pascal (the last of which is so old that you pretty much can't find references to it online.)

Please don't lecture to me about Pascal; my use of Pascal predates Borland's very existence.

The main reason, and the fundament of a unit system, why the second way is more optimal than headers, is that the compiler reinitialises before reading a unit interface.

Uh. The pascal unit system is simply an in-code linking mechanism. It's no different than rolling your source together with your makefiles; if you'd bothered to read Wirth's papers on the design of the language you'd find out that Wirth himself suggests that "unit" is nothing important, and pretty much just syntactic sugar.

Now, how the compiler "reinitializes" before reading a unit interface is a little bit beyond me: the unit interface is just what a C++ programmer would call a collection of vtbls. Would you be willing to point me to any point

--
StoneCypher is Full of BS