Abandoning Header Files?
garethw asks: "I'm working on a project where the lead developer, following a suggestion by our tool vendor, wants to get rid of the header files and directly #include source code. The language is a somewhat specialized language, but for all intents and purposes, you can assume it's Java or C. The conventional argument I recall for using header files, and incremental compilation, is that it's faster to use a makefile and conditionally build only those files that have changed. However, it turns out that the brute force of invoking the compiler once on the top-level does actually compile much faster. I feel that there is something about #include'ing source files directly, compiling only the top-level file, just doesn't 'feel' right and I'm at a loss to really give a solid argument as to why. Has anyone actually used this approach? Does anyone have any thoughts on any advantages or drawbacks?"
...following a suggestion by our tool vendor,...
How much money will your tool vendor make if you implement this suggestion and what, if any, product does she sell that neatly solves any problems this might bring up?
It's simple: I demand prosecution for torture.
static global variables have scope within the module they're defined in. Which means that two static globals in different source files don't collide, because they're in different modules.
Including everything into one big source file will mean that they're both in the same module, and so will collide. Not good.
Can't say about other languages, though.
Well, there's the obvious separation of interface from definition. And the problem of duplicate definitions - there's a reason why "extern" is a keyword. :)
Plus, header files define an interface, which is useful if you don't actually have the code (i.e. binary shared library). Moot point in your case, I think, but...
Plus it's just good programming style to have separate definitions and implementations. Easier to track down bugs.
...but it's being eaten...by some...Linux or something...
They are just about the only way to centrally organize declarations for data structures and function signatures. Doing so will save your ass eventually, because having function prototypes available can allow the compiler and lint tools catch stupid programmer errors. You do use lint-like tools, right? They _will_ catch bugs that testers and visual scanning wont.
The only draw back to headers in C is that if you forget to 'make clean' after changing a header, you can end up with object files using old definitions. Just make a habit of doing a full build after changing the headers. If you designed your software properly, changing header files won't be all that common (adding functions new data structures, etc.).
-- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
Have you tested the speed difference when you change only one non-header file? I bet incremental compilation will make that quite a bit faster. In addition, if you want to compile that changed source file to check for syntax or type errors, you don't have to check for collision between it and the whole rest of the project, only collisions between it and the header defining it.
My OS prof was demonstrating the differences in what errors the C compiler and linker would pick up. However, we found that we could make two source files with no include lines in either that both defined a global variable (sans extern). The main function set the global variable and then called a function that is defined in the other source file, which would then print the gv. Then we compiled and linked them with gcc. No warnings, no errors. The program ran exactly the way we wanted it to, which was unexpected. So yes, you can do away with includes and header files without even performing the includes manually. Depending on the language, your compiler might be smart enough to figure it out.
But that doesn't make it a good idea. Besides, do you want to be the one who has to go update the library functions that would normally have been included any time you change the code in one file?
It seems that the onus should be on the vendor to explain very, very convincingly why you should abandon decades of standard practice and good coding practice. This better be one hell of a good product you're developing to justify the should a radical change. You shouldn't need to defend standard practice, they must campaign for a change to that practice. Imagine trying to explain this to all the coders who will work on the product for the next decade - will they think you're crazy or is there really a reason to do this?
- Disadvantages:
- You're not doing it the way everyone expects you to do it. Certain components (the compiler, the linker, and pre-existing code) might have been designed under the assumption that individual files would be compiled separately. The pre-existing code might have declared static (per-file) variables or functions in a way that could collide with other code (namespaces might help here). The compiler and linker might have limits. And you might not hit those limits until late in the project.
- For building the whole product, yeah, it will be faster. But for making a small change and rebuilding the results of that change, it might be much slower.
As with every issue you'll ever run into, there are two (or three) sides to it.Time flies like an arrow. Fruit flies like a banana.
gcc -c f1.c gcc -c f2.c gcc -c f3.c gcc -o f f1.o f2.o f3.o
Your vendor instead thinks it would be better to do:
gcc -o f f.c
Where f.c looks like:
Am I right, or am I completely off track?
If I'm right, you'd probably still want to include header files because you want everything to remain modular. According to software engineering type people, that makes maintenance easier. Another problem is symbol scoping. C keeps symbols local to the module they appear in, so you want to make sure you have naming conventions, namespaces, or some other protection against naming clashes. I'm dubious about the benefits, but I work on projects that take significant amounts of time to compile. Not hours, but enough time that if you wait for all the objects to compile you are wasting a lot of time. In general, I'd claim that the larger the project, the worse an idea it is.
Whoever corrects a mocker invites insult;
whoever rebukes a wicked man incurs abuse.
--Proverbs 9:7
While including code directly may speed up the compilation time, you will loose all the time you gain and then some when you get into debugging.
If you have a complicated #include chain, you can wind up with a lot of duplication. Some compilers will complain, some won't. However, if you have typedefs, structs or the like, most compliers will complain and not compile your code until the duplications are removed. I don't know what compiler you're using or if you are planning on including more than functions or global variables, so I don't know if this is an issue or not.
The more general issue is that it's much easier to track down bugs and other problems if there is a clean separation between definitions and implementations. I can't characterize that difference in a few sentences, so I'll just say that it has been my experience that projects which are developed in a true modular nature are much easier to debug than projects designed in a monolithic nature. The time saved in debugging more than makes up for a little time lost in compilation.
Depending on the size of your project, you will get varying returns from each of these:
1. Seperate source files means that units of code can hide data and functions.
2. Seperate headers, combined with something like GCC's -Wmissing-prototypes enforces the good coding practice of well defined functional interfaces.
3. Seperate headers and source files means that when you look at a function in a file, you will have some idea of what it touches because you can go and look that it included header X but not Y.
4. You can tell the compiler to explicitly forbid global data symbols, which is pointless in one single file.
5. You can use different compiler switches for different files.
6. Your code will have some hope of portability.
If your project is small, it doesn't matter anyway. If your project is large, you can get your compiler to enforce some good design rules on you, which doesn't mean you can't still have a good design anyway, but it will make it more likely. I worked on a project that used a compiler that let you get away with everything. Try and port that code to anything UNIX-like, and it was ridiculous.
Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
You have issues with scope.
The easist one for me is that with #includes, it's so much easier to fix bugs. If you find a bug or an inefficient way to solve a problem, you only have to fix it once. Everything that #includes the suspect file will be fixed on the next compile.
If customfunctions.h has been changed or optimized, you don't have to edit the 30 projects that you're using those functions in. Just the one file is fixed, each project gets the benefits during the next compile.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
If your language supports "static" in the file-scope sense, you could declare every global object as static and reap the compiler optimizations that come with that declaration.
If your language supports smart inlining, you could end up with code that has been inlined more effectively, since any code could be an inlining candidate regardless of location.
I can think of plenty of reasons to back away from the idea, but they'll flood in here without my help.
It is hard to tell from your statements, but this may stop tools like ccache from working. I use ccache in my projects and it radically cuts down the amount of recompilation required when I do a complete rebuild. Now, an obvious question is why I don't simply rely on makefiles to ensure only changed files ever get rebuilt. This often happens because compilation involves generating new cpp files that are then compiled and I don't want to be grepping through these all the time. I suppose I could move them all to a different directory, but ccache works very well.
The other problem, of course, is that separating your classes into header and implementation means that if you change the implementation, you only need to recompile that one file and relink, rather than recompiling EVERYTHING. This can be a matter of a few seconds vs. several minutes. And implementation does change, a lot... fix a bug, you fix the implementation. The headers change too, but much much less frequently.
Oceania has always been at war with Eastasia.
When I write libraries, I try to make them header-only. Generally users don't want to have to modify their makefiles if they don't have to, and I'll resort to compiler specific pragmas if I have to.
:)
It depends on the size of the system. If you are using a component-based system then only the pieces of the system that are actually being modified should be compiling anyway, which cuts out a lot of compilation. However this implies there is fairly loose coupling involved. In a more conventional application, there has to be a breaking point where the amount of time to parse files is longer than the time to link them normally. Using precompiled headers on any system header will also drastically decrease the time it takes to compile, since the compiler essentially just dumps the parse tree out to disk. So much time is spent inside some system headers! (Especially Windows.h. Ugh!)
There are some tools that keep the header files in sync with source files automatically, but I don't know of any off-hand. I have seen some for C, but I'm not sure there is one for C++ that supports all the wild and crazy stuff like namespaces and templates.
Including all the source code into one main file compiled to one object can work, if the source files cooperate. C can have problems with the namespace, but C++ allows multiple namespaces and you can even put the namespace blocks in the main file around the #includes. The source code has to support this, though. It's best if all the source files to be included are under your control. For libraries that expect to use a declarative header, use it like it was intended.
.class filename = the ONE public class exported by the file. Unless you want a total of 1 public class, it won't work. Java doesn't use header files anyways. Class binaries export everything public automatically.
I've done this on lots of projects and it works great. Most of the arguments here are either about performance or an appeal to tradition (that's the way we've always done it... must be the only true way). Modern compilers will create pre-compiled headers that can include code, usually used for template and inline definitions; modern compilers don't get the same benefits from the traditional model anymore. Actually, even larger projects seem to take longer to link with iostream and windows.h than the source does to compile.
The compiler's ability to optomize code may be increased greatly, espescially its ability to inline functions. Too much inlining will cause code bloat, but the compiler's options should give you control over the balance.
Modern compilers also allow you to change the compilation options mid-file.
Any debugger or source analyzer shouldn't have problems handling inline or same-file implementations, or you're using bad tools.
It can also be easier to create test code; create a series of test files t01.cpp, t02.cpp (each with a main) but include only one. The others are there for reference but don't interfere. This is also useful for testing a prototype replacement for a component; include the new one and comment out the old include. Going back is trivial.
It's more a question of coding style than anything. Personally, I hate maintaining redundant information of any kind, and this very much includes the prototypes in the header with the actual functions. Source code redundancy is bad for all the same reasons that database redundancy is bad. Making my C++ member functions inline and including their files frees me from this.
I don't think this will work too well in Java. A Java source filename = the
Its an interesting approach and you have no idea why you shouldn't do it.
So do it.
In the end, regardless if it works or not, you will have learned something new.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
One of the really depressing things about having been in the business for nigh on to 40 years now is that, along with the occasional new dumb idea, all the old dumb ideas keep coming back. Among those dumb ideas that keep coming back are "visual programming" --- using graphics instead of programming languages; complicated schematic graphics for software --- UML in its utter complex form; and, sure enough, using the preprocessor to mess with C-like languages.
Every time this is tried --- and God knows it's been tried a lot --- you run into some severe problems:
If you've got control of the compiler for this peculiar language, why not explore making the startup time shorter, say, eg., by using shared libraries, DLLs, or by setting the sticky bit?
Well, coding style and software engineering aside, you need to do some testing if you think this will increase you speed.
Quick test to illustrate. 1,000,000 lines of C code, using gcc 3.3.4, default options.
Time to compile spread of 1000 files (with 8 lines of include and function body per file): ~2 minutes
Time to compile all in a single file: unknown
Why is the second time unknown? My computer doesn't have the memory to do it. Now I could pump up the memory of my machine (assuming I've got a 64-bit machine) to let me do the second compilation, but I doubt it'll be faster.
Speeding up a full build should not be important. The only people who care about it are in your test lab doing daily builds and regression tests, who can start the build overnight and have it ready by morning. Of course, this is the situation in a well-designed application. If you find yourself needing a full rebuild all the time, it means one of two things: 1. you are hacking a core component, or 2. all your components are written with spaghetti code and any change in one forces rebuilds in all the others.
In the first case, try just testing one or two components during development, and then verify all the others when the API is stabilized. This is, incidentally, the advantage you gain from using header files: once the API is stable, you never need to rebuild that component again except to fix bugs (which require rebuilding only that component).
In the second case, you need some serious refactoring. Look at the code and break it up. Encapsulate everything you possibly can. Make stuff private and static. Make everything you don't modify const. Keep it up until each component is accessed only through its API and that API is clean. Trust me, this is possible in any project. The enormous decrease in maintenance costs will more than pay for any time you spend on it.
First of all, "speed", either compilation-wise or runtime-wise, has nothing to do with why you should use header files.
I too disliked header files, long ago, in my early days of programming C. It seemed pointless, to have two files (or rarely, as many as four), when one would do just as well.
For small projects, I'll still use one large monolithic source file. In that aspect, it makes sense to skip breaking out your data and function definitions.
But when you get to the "real" world... Imagine even a "small" serious project, with perhaps 10k lines of code. Try to find a single function in that file - I hope you feel on good terms with your IDE's search capabilities!
So, break that out into a dozen files - You have your network code in one file, your UI code in another, your file I/O in another, perhaps some database interaction in another, and so on. Okay, that works well... But wait, your network code, your file I/O, and your database code, all make use of the same checksum algorithm! So, you have the same exact code duplicated three times.
That would work, because each file will compile to a module with its own namespace (in most languages). But it wastes space, both in the source and in the compiled code. It also wastes time and can very easily introduce bugs - For example, if you decide you need to switch from MD5 for SHA1 as your checksumming algorithm, you now need to change three places instead of one. If you miss one of those, but use them to compare results between the three different uses, you have a very serious bug that may drive you batty trying to track it down.
So, the obvious solution, break out all your common functions into a toolkit-like source file. Now, you could just #include that in every other file that needs it, but WOW would that cause some serious bloat in the compiled code - In my experience, shared code files frequently end up as the single largest source file in the entire project.
So, use a header file. That way, you don't end up with massive duplication of code, you have the advantage of a logical breakout of your code into similar-purpose files, and you can still make changes to only one file to modify one function.
Incidentally, the above chain of thinking more-or-less describes the evolution of standard libraries... Would your professor actually suggest that you shouldn't "#include<stdio.h>", but instead should manually pull the code for each function you use into your source file? Because, in the degenerative case, he has told you exactly that.
Just remember that a header file defines the interface to the body: which actually duplicates some of the material in the body. Because of this duplication, you can have problems, i.e. faulty build dependencies, mismatch between header/body, etc. Removing this sort of duplication is usually a good thing: so if the technology (i.e. compiler) is smart/performance/etc to get it right, then the change could be a good thing.
I'd like to point out that many other respondants have argued their case with reference to 'C', however the poster clearly said it was not 'C' -- without further information, it's difficult to know whether these 'C' type issues will translate. I'd point out that some languages, e.g. python, java, perl, do not have ideas of separate header/body -- suggesting that "current trends" in languages is to do away with the duplication.
The compiler could be intelligent enough to construct a parse tree quickly, and only resolve parts of the parse tree when necessary: so for example, if there was previously a 5K header, and a 30K body, but now only a 30K body, the compiler may read the entire 30K, and only "roughly" parse it (e.g., say for a function, it parses the outer scope of the function, but resolves nothing inside the function until some other code actually uses the function).
I don't think there's an answer for this guy: there are too many issues that haven't been stated, as we know nothing about the particular toolchain, the build environment, the language, etc. All we have an abstract concept of splitting files into header/body. That concept by itself isn't good or bad, it depends upon a lot of other issues that change the perspective.
My answer would be that surely in the guys company he has a couple of clueful senior engineers that can sit around a whiteboard and discuss (using their computer science training) what actual impact the change will have on the project, and whether to go with the impact.
I wish I'd included a few more details, which might have avoided questions like, "Are you stupid?" and "Have you taken basic Computer Science course?" (the answers are "On occasion" and "Waterloo, Comp Eng '98" respectively
A few details which might put the question into perspective might be:
- The project is a chip verification project. There is no final "product" at the end of my work. The name of the game is endlessly re-compiling and running new tests. So compile time is actually quite significant.
- There is no linker.
:) The nature of the language is such that it is linked at run time.
- The compiler actually doesn't allow you to list multiple source files on the command line and produce one object. So I guess my C/Java analogy was misleading. But that's partly why I'm at a loss to rationalize the question - there is little direct reference point.
- A lot of people missed my point - I think abandoning header files is abhorrent. But when it came down to it, I couldn't actually produce any inarguable reasons why (namespace is one, but I don't think it's a show-stopper).
Thanks again for your insights.garethw
I had a whole reply ready, but IMHO it is not worth the trouble replying to.
"Oh, I wrote a reply, but I don't want to paste it because you're not a good person and I don't want to." My eight year old son knows that nobody falls for this sort of passive agressive dismissal; it's disappointing that you do not.
Namedropping doesn't make you seem correct, y'know.
Neither does getting personal about perceived faults, when it's pretty much your own assumptions that are the problem.
Observing that something you've done isn't effective is hardly my getting personal. Believe me, there's no shortage of material; what I said above about my son is, for example, personal, as is the following: turn down the whine knob until you've got something worth saying to say.
So the whole template rant is bogus.
Given that it was not you but the original poster which set the domain of important languages, and given that I also touch on pure-C and pure-Java issues, this protest is as bogus as it pretends what it's attacking to be.
- When I am talking about state, I'm talking about state in the compiler. (which in most _performing_ tools has the preprocessor built in btw)
Yes, I heard you the first time. The reason I referred you to modern c++ design is that you're wrong, and I have neither the patience nor the kindness to explain it to you. Start with section 3.5, or with any page explaining how C++ template metaprogramming is a functional language rather than an imperative language. Before you fly off the handle talking about how you weren't referring to templates *again*, please realize that the observations regarding template MP as a functional language in fact apply to everytihng in the C and C++ preprocessors.
Do not reply until you have read; repeating ignorance is no more argument than repeating falsehoods.
-- by {$i} ({$include in delphi), like #include in C
Actually, you're shooting yourself in the foot here. You're attempting to make the hasty generalization that there are two approaches to bringing outside code into a local place, that C/C++ advocate an "inline header system" (whatever the hell that is) and that Delphi does something different.
What you seem to fail to understand is that uses is a call to the Delphi linker; it is literally the same thing as the linker in C++, and in fact if you take the time to look at borland's BPIL, you'll find that they generate the exact same intermediate language binary. Furthermore, the very same examples you give, {$i} and {{$include}}, are the same as #include. Furthermore, both languages offer still other mechanisms to bring code in or to generate code.
That said, talking about Pascal's differences with C and then discussing what Delphi does is roughly equivalent to describing what Objective C does. Delphi is not pascal any more than Objective C is C. They are distinct languages. Delphi is Borland's third pascal variant, Object Pascal, which follows both Borland Pascal and Token Pascal (the last of which is so old that you pretty much can't find references to it online.)
Please don't lecture to me about Pascal; my use of Pascal predates Borland's very existence.
The main reason, and the fundament of a unit system, why the second way is more optimal than headers, is that the compiler reinitialises before reading a unit interface.
Uh. The pascal unit system is simply an in-code linking mechanism. It's no different than rolling your source together with your makefiles; if you'd bothered to read Wirth's papers on the design of the language you'd find out that Wirth himself suggests that "unit" is nothing important, and pretty much just syntactic sugar.
Now, how the compiler "reinitializes" before reading a unit interface is a little bit beyond me: the unit interface is just what a C++ programmer would call a collection of vtbls. Would you be willing to point me to any point
StoneCypher is Full of BS
The answer has to be "we don't know" because you haven't told us what language this is. I reckon it's some sort of verilogrevolting or system-Cyukyuk abomination,
Not a bad guess. It's Vera. But what's the point of telling people that? How many people on slashdot even know what Vera is?
in which case I would say *** do what the vendor tells you *** because if you don't and it all falls apart they will say "we told you so"
But what's messed up is that vendor just decided to start advocating this one day. The Vera compiler even supports header generation - it's been done the way we expect for years. And bang, one day they suddenly start encouraging this weird approach. "What the vendor tells me" is not what the vendor told me five years ago, or last year.
I know that there was some discussion sparked internally at the vendor because I raised this question with their FAEs. I already know what other verification people think; I wanted to get an impression of what people thought of the same methodology if it were applied to similar languages.
garethw