Doom 3 Source Code: Beautiful
jones_supa writes "Shawn McGrath, the creator of the PS3 psychedelic puzzle-racing game Dyad, takes another look at Doom 3 source code. Instead of the technical reviews of Fabien Sanglard, Shawn zooms in with emphasis purely on coding style. He gives his insights in lexical analysis, const and rigid parameters, amount of comments, spacing, templates and method names. There is also some thoughts about coming to C++ with C background and without it. Even John Carmack himself popped in to give a comment."
I'll be here all week.
It's just a personal point of view about coding style... some things like vertical spacing using the braces (at the section "Doom does not waste vertical space") are just the opposite of a readable source code (just in my opinion, of course, as someone that makes a lot of source code reviews of other people)
A developer needs to make an application for the hardware people have, not the hardware he wishes people had. Otherwise, he's likely to end up limiting his market to a subset that's not big enough to turn a profit.
In some ways, I still think the Quake 3 code is cleaner, as a final evolution of my C style, rather than the first iteration of my C++ style, but it may be more of a factor of the smaller total line count, or the fact that I haven’t really looked at it in a decade. I do think "good C++" is better than "good C" from a readability standpoint, all other things being equal.
I sort of meandered into C++ with Doom 3 – I was an experienced C programmer with OOP background from NeXT’s Objective-C, so I just started writing C++ without any proper study of usage and idiom. In retrospect, I very much wish I had read Effective C++ and some other material. A couple of the other programmers had prior C++ experience, but they mostly followed the stylistic choices I set.
I mistrusted templates for many years, and still use them with restraint, but I eventually decided I liked strong typing more than I disliked weird code in headers. The debate on STL is still ongoing here at Id, and gets a little spirited. Back when Doom 3 was started, using STL was almost certainly not a good call, but reasonable arguments can be made for it today, even in games.
I am a full const nazi nowadays, and I chide any programmer that doesn’t const every variable and parameter that can be.
The major evolution that is still going on for me is towards a more functional programming style, which involves unlearning a lot of old habits, and backing away from some OOP directions.
One might suggest that every good programmer, if they spend enough time improving, eventually moves toward a more functional programming style.
"First they came for the slanderers and i said nothing."
I've developed for large game and non-game projects, and each needs a different approach. Console games especially have serious problems with dynamic memory allocation (they don't typically have swap files and can die due to heap fragmentation) so you have to avoid a lot of convenience libraries like STL.
STL, however - especially in newer compilers that support C++0x - is actually quite good and is very, very robust. It's a good way to avoid a lot of the memory management bugaboos that happen when you *are* doing lots of dynamic/heap allocation. So I would very much endorse a sane amount of STL use in desktop code.
The other thing that rubbed me the wrong way here was public member variables. Since inlining and move semantics make getters and setters essentially free, there is no good reason to expose bare, public variables on anything but the simplest, most struct-like objects. The biggest source of weird, hard to trace bugs in our code at the game studio were often due to people modifying public members of other objects in unexpected ways or at unexpected times.
Having public, non-const member variables actually hurts a principle the author supports, which is "Code should be locally coherent and single-functioned". This means that an operation should do one thing and put you in one of several known and easily discoverable states, even on failure. That is, if I say, make this guy do X, then either he does X or he fails and ends up in a known state. If that state is available in the form of modifiable public data, then his state can get messed with at any point along that path by some other code, and the final state (in cases of success and failure) is not fully known. At the very lest, making data private means that only certain code paths can modify the data, and it's much easier to keep state coherent.
Anyway, that's just my $0.02.
In any properly written compiler, both are pretty equivalent and in almost all cases there wouldn't even be a difference in generated assembler, let alone performance.
case is basically a lot of else-if's, and else-if's are basically a single-path case.
Switch statements are faster if there are enough cases, because a branch table can be used. For switches with only a few case statements, a good compiler should use conditional branches, resulting in the same code as an else if, because that is faster in that case. I presume a really good compiler would also be smart enough to use a branch table if enough else ifs are chained together too, but I haven't had to deal with writing that highly optimized code for a while to have been keeping tabs on compilers to that extent.
Better known as 318230.
case statements are not faster than if-else statements. Often a case statement will be turned into a load of if-else's by the compiler anyway (and a set of if-else statements could be turned a lookup table too!)
In any case, "far faster" is not true, the machine statements generated are tiny compared to every other inefficiency in a codebase. Thinking a case statement makes your code faster is like painting your car red to improve its speed when you've got a load of heavy junk in the boot.
Are you an idiot? Case statements don't do the same thing as if else. The example in the article does some floating-point compares. How do you represent that as a case statement in C++? Come on, I'm waiting. Oh, that's right. You can't.
Case statements take an integer value and switch based on it. You cannot have case (dot < -epsilon) or case (dot > epsilon). Got that? Good.
wonder what else ID missed
If you think Carmack "missed" something, take a deep breath, count to ten, and figure out what you missed.
No, he's not perfect - I found a bug in DOOM 2 that he never tracked down - but until you prove yourself STFU about how Carmack may have "missed" something you only learned on Stack Overflow anyway. Carmack is a Level 99 Wizard while all you can do is read the descriptions of the kinds of spells he can cast.
God people like you are annoying. Shut up and think, and you might learn something.
Looking at the article, I see nothing that could have been a switch statement. The only else-ifs I see are in the Spacing section, and they need to check if a value is less than or greater than, something that can't be done with a switch.
My UID is prime... is yours?
I really liked this bit, because it's something I've been really focusing on for the last year or so, and I think it has significantly improved my code:
Comments should be avoided whenever possible. Comments duplicate work when both writing and reading code. If you need to comment something to make it understandable it should probably be rewritten.
Comments can be useful, IMO, but primarily only for generating documentation (think Javadoc or doxygen, etc.). Other exceptions include bits of code that perform highly-optimized mathematical calculations, in which case I think the best solution is to write a proper document and then add a comment linking to the document, and bits of code that do something which apparently could be done differently but for some other reason must not -- assuming that explanation doesn't belong in the doc-generating comments.
Other than that, I find it makes my code a lot better if every time I find myself wanting to write a comment to explain some bit of code's purpose or operation, I instead refactor until the comment is no longer necessary. Often it's as simple as taking a chunk of code from one method/function and pulling it out into another with a well-chosen name, or else introducing a variable to hold an intermediate value in a calculation, with a well-chosen name. Sometimes the fact that a bit of code is hard to explain is a strong indicator that the design is wrong, that stuff is mashed together that shouldn't be.
The bottom line is that I've found eliminating comments does more for improving the readability of my code than anything else, and I've gotten similar feedback from colleagues whose code I critique by pointing out that they can eliminate their comments if they refactor a bit.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
And a note on the relative evil of comments; bad or not, well placed comments have saved me an awful lot of time when taking on maintenance of code bases in the past. Most of the time they can't present a design document to you, or if they do it covers the design at the start of the project, a decade and a half earlier. Code is a method of communication between two programmers, but if the code doesn't suffice to illuminate the design the original programmer had in mind, I'd really appreciate a comment explaining his thoughts. Especially if the particular section of code is complex, and especially if I'm the guy writing it and end up being the guy maintaining it a couple years later.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I know some situations else-if statements are necessary, but my understanding is that case statements are far faster.
Very often rules about efficiency like this one are incorrect. Sometimes the compiler will even change things completely when you compile it. In one example, I once carefully wrote a function to only have a return statement at the end, because I (somehow) thought it would be more efficient. Then I looked at the assembly output from the compiler, only to find that the compiler had added in all the extra return statements I had so carefully avoided. After that, I just went with what was most readable.
If you really care about efficiency, there is one way to do it: you MUST time your code. Try the case statement, and time it. Then try the if statements, and time it. If you don't time it, you are just guessing and you WILL be wrong.
The case of the if statements in the article is a tricky example, because it is a range, and writing it as a switch statement would likely be a large table. Doing this could actually slow things down because it fills up the memory caches with mostly needless information. Note this can also be a problem with traditional optimizations like pre-calculated tables or loop unrolling, they can actually slow things down.
TLDR: If you want to make your code efficient, you need to time it.
"First they came for the slanderers and i said nothing."
I found both the article and what JC wrote to be highly informative and rather validating of my approach to things. In software, we usually get little validation because of the wide variety of opinions of who we work with. We've all seen the extremes: The hard core C programmer who can't be bothered with any OO nonsense, and who advocates inspecting the assembly of every method you write. The C++ hippie who sees everything as some kind of exercise in getting the compiler to write the code for you.
I'm sure most of us follow a more balanced approach. C++ has to be about performance over anything else, otherwise there are plenty of other languages that accomplish much greater degrees of expression, but can't cash the performance check. But, expressibility is important, too, because performance goes out the door once we stop understanding what the code is doing. It's nice to have a language that lets you express things somewhat functionally, yet gives you the flexibility to wring out serious performance.
And since somebody else will say this if I don't, OO enthusiasts have a distaste for both else-if's and case statements, seeing either as a candidate for subclassing and virtual functions.
"Things seem wiser when you become older and senile"
Write a review of Solaris code, and it'll probably get posted on Slashdot, too. I for one would be interested in reading that.
"First they came for the slanderers and i said nothing."
Well of course. If you knew what you didn't know, you'd know it instead of not knowing it, you know?
after the first three conclusions, and i stopped reading so i can't speak for the rest. should be: 1.) const as appropriate, not "const everything possible". const can fuck you hard in OOP if you use it wrong, 2.) you can never have too many comments, and 3.) tight vertical spacing is archaic and stupid, unless absolutely necessary for some display reason
if this guy was interviewing here and mentioned all the things in his article, i probably wouldn't hire him. too much "religion", as it were, which is a huge red flag for me because it's usually masking something...
Often as in you've measured it, or often as in "I'm making shit up"?
A good compiler will never implement a case statement as a load of if-else's, unless the case values are sparse, or you're not optimizing.
Meanwhile, transforming a set of if-else statements into a lookup table is seldom possible unless the if-elses all compare the same integer variable to a constant. In that case, it can in theory, but almost certainly won't in practice.
Other things being equal, a switch statement with contiguous constant cases will almost always compile to faster code than the equivalent set of if-elses. And it will be far faster. Every if/else induces a branch, and mis-prediction will be severe on most of those branches, causing 10-20+ cycles of stall on modern processors. The jump table mispredicts almost always, but only once. If one arm is taken 99% of the time you can speed things up by using an if/else and then a switch, but that's a rare case.
I appreciate the fact you're responding to the idiocy of the above post, but your points are as wrong as his.
I never understood why this was a conflict for programmers. If the white space isn't syntactic, can't your editor just rearrange the code the way you want it? Just run it through a pretty printer before you work on it.
Give me Classic Slashdot or give me death!
Heh... if they're dictating tab width, they're doing it wrong. If you must have a certain tab width, you should be using spaces for everything or you lose the whole benefit of tabs - letting people choose their preferred indentation size.
Use tabs for indentation, spaces for alignment. That way you'll never go wrong. Looks like this was one of the less "beautiful" things about the Doom 3 code.
== Jez ==
Do you miss Firefox? Try Pale Moon.
Except that will throw off diff.
Case statements can be optimized using jump tables.
Any semantically-equivalent code (that is, two instances of code that "does the same thing") can be optimized to the same set of instructions. It's just a matter of whether or not the optimizer can figure that out.
> I don't see the point of not using STL.
Code bloat, hard to debug, memory fragmentation, and no way to serialize/deserialize in a fast way.
I highly recommend ALL C++ programmers to read this doc on why EA designed and implemented their own STL version. It provides insight into the type of problems console game developers have face that the PC game developers just routinely ignore or are ignorant of.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
As others have pointed out, case statements utilize jump tables for better performance. It is noteworthy because depending on the elements within a case statement, it isn't always efficient. If you're testing an enum then it will be great as the resulting jump table will be small. If you're testing an int value, the resulting jump table could be huge - so you have to be careful.
Depending on what you're programming for (embedded systems?) the choice of if to use a case or if-else statement is not always obvious. You have to look at each case individually. But for desktop programming, one should use the syntax that results in the most readable / understandable code. Desktop compilers are good and should be able to optimize the result. The impact of one over the other will not be noticed 99% of the time - while the difference will impact code maintenance 100% of the time.
How do you represent that as a case statement in C++? Come on, I'm waiting. Oh, that's right. You can't.
switch (dot < -LIGHT_CLIP_EPSILON ? 1 : dot > LIGHT_CLIP_EPSILON ? 2 : 0) {
case 1:
sides[i] = SIDE_BACK;
break;
case 2:
sides[i] = SIDE_FRONT;
break;
default:
sides[i] = SIDE_ON;
}
Yeah, yeah, I know, that's totally ridiculous (although I did see things as bad and worse as a CS instructor's assistant whose job it was to grade Pascal students' programming assignments back in the day - that was very interesting to say the least).
On a side note, why can't > and < characters be used in a code element? Um, that's lame, especially for a site that discusses programming so much.
Better known as 318230.
This royally fucks up your version control history.
Most modern systems have the means to annotate a file - every line - with the revision that last changed it. It's invaluable if you want to work out which revision introduced a particular bug.
If you all have a whitespace war, every revision that touches the file will touch every line in the file.
Unless your merge driver is enlightened as well, you'll find it plays hell with your changes as well, every time you pull changes you'll get a conflict.
It's just rude, the hallmark of a prima donna programmer who's never worked with others.
Most game companies write engines to make games. The game is the product, so it's not too important how it looks under the cover. iD makes games to publicize engines. So it really does matter for the code to look nice.
Its easy to go back over a finished codebase, run some static analysis and refactoring tools and clean it up. Chances are this is what was done before releasing it to be open source. I can guarantee this source coded didn't look anything like this during development.
Of course this is maybe why the game took longer to develop, spending more time making the code look purdy then, say, making the game a superlative gaming experience.
I haven't thought of anything clever to put here, but then again most of you haven't either.
To expand upon your final point: The real reason to use switch vs else-if is in what you communicate to other programmers. Switch communicates that you're evaluating exactly one variable/operation. Else-if towers can mix and match the evaluation criteria. Programmers who choose an else-if tower for evaluating the same variable all the way through are just inviting trouble in the future, when someone comes along and adds an additional clause to one of the evaluations and screws the whole thing up. Oh also, some compilers have a maximum number of else-if conditions. I worked at a company that created a huge else-if tower which eventually grew too large and broke MSFT's cl. We quickly rewrote the code as a hashtable (which is what it should have been anyway).
The Doom 3 code is quite ugly.
If you want to read good code, try the Linux kernel.
If that's beautiful, I really don't want to read his "ugly" code...
Oh my god, this is the worst programming advice I've ever heard. Is this a joke? Maybe some clever attempt at creating job security?
There is a terrible dearth of commented code in the world -- especially in the lower-level languages like C and C++ -- and this guy is telling people we need fewer comments in our code?
Modern copyright is theft of culture from everyone and it retards the progress of the useful arts and sciences.
He loves the lack of white space, I hate it. Cramped code is irritating to read. If you want to take up less vertical space, reduce your font and increase the whitespace. You have a better sense of the separation of statements, stronger scoping and less room for error.
He also loves the lack of comments. I remain firmly in the camp that if you eschew comments as common practice, you're an idiot and you should stay away from programming on big teams.
It's not a clarity of code issue. I expect your code to be clear, too. But even after 20 years of programming, I read English faster than I read code. A description of an algorithm in English is going to be more terse than the code that implements it. Your code has to account for edge cases, but I probably just want to know what the code does and how the code does it at a high level so I can get a sense of the system and architecture. A descriptive method name only tells me WHAT the method does, not the manner in which it's done.
English (any natural language, really) is a powerful language with extraordinary expressive power. I don't understand why programmers are constantly trying to sweep it under the rug. Don't fill your code with useless comments like // increments the counter by 1, but if you're doing a non-trivial mathematical calculation that takes a whole method to encapsulate it, let me know what I'm getting in to.
Code comments--especially system level comments--should include the name of the author or current maintainer, as well. I tag my methods with my name and the date that the code was put in so people know where to go if there's trouble. They don't have to hunt through perforce time-lapses to see that I checked it in, they just email me.
And have some consideration for the new guy on the team, or the team that has to use your code 5 years in the future. They can't ask you questions, the context of the situation is lost, the code-base might be in the middle of being re-purposed (common in the game industry--which is where I am); comments are essential to maintainability. Man, I do code reviews and people often manage to forget exactly what they were trying to do, and it's only been a few hours. We always work it out, but if there were a comment, we wouldn't even have to spend THAT time.
Use comments. Use them wisely. It makes you a better programmer because you're wasting less of OTHER people's time.
case statements are not faster than if-else statements
This is one of the worst comments I've ever seen with an Informative mod on Slashdot.
Most of the time, switch / case statements are optimized by the compiler to use jump tables that are much more efficient at runtime than evaluating expression after expression.
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
Came hoping to learn what's so beautiful about iD's code, left convinced that the author (Shawn McGrath) and I have rather different opinions on that... iDs code is certainly not an example of poor code, in a previous job I had the opportunity to view code from around 20 different AAA game studios, its definitely in the top quarter (but that's not saying a great deal); mostly the article is 50 paragraphs of cooing "iD does what I do, guys!" Analysis of what makes said style "beautiful" is subjective at best, and furthermore the author describes himself as "not a coder". For what its worth, IMHO, the best code that I've seen came out of Remedy.
l would agree here. I only saw some bits of it but comparing code between Linux, BSD, and Solaris that did the same thing, the Solaris stuff was definitely the easiest to understand. Linux I found to be the most obtuse in comparison. Though to be fair the code bases are so large with so many authors that some code may look great while others are awful. Solaris I think wins just from having coding styles and standards.
I normally hate most of the garbage that comes out of Kotaku, but this is a really good article. That said, it reminds me a lot of Yossi Kreinin's C++ FQA. A good chunk of the article is spent talking about how Doom 3's source is good code despite being bad C++. What kind of language is best written in a way frowned on by the C++ community? Absurd!
I always code for a single return statement at the end of a function but not for performance reasons, I just think it is easier to eyeball. I don't care what the compiler does with it. I hate trying to eye-debug a method/function that is peppered with return statements (aside from maybe a single "guard" statement at the top of the function), I inevitably miss one and go trundling down the wrong path, wasting a bunch of time in the process. My functions typically all end with "return retVal;" YMMV
while [ 1 ]; do echo -n -e "\xe2\x95\xb$((($RANDOM&1)+1))"; done
"All id games before Quake III were written in C."
Fool! Quake III Arena was also written in C.
he's probably referring more to reference and method 'constness' which have semantic rather than just simply syntactic implications.
I wholeheartedly agree with this sentiment. A lot of developers have personal preferences on their indentation. I worked on a project where distinct devs used 2, 3, 4 and 8 spaces as a single indent. My argument was it was all visual. If we stuck with hard tabs, each could configured their editor of choice to display however they liked, without affecting any other developer. I used to use tabs for alignment, as well, but have since changed to using spaces for that.
As many others have pointed out in their replies to you, a good optimizer will often wash away the performance differences. Performance is one of those things that is desperately needed when it's needed, but in general It's more important to me that my code be readable and provably correct. That means using the language and statements that make my intentions clear.
As far as efficiency goes, when you see an "else if" or "case" statement, consider polymorphism and a state pattern instead. You make the decision exactly one time, at the time you learn the value of the data in the proper context. Then when it comes to the code where you would have put an if/else-if ladder or a switch/case construct, you simply dereference a pointer and are executing the proper logic. Having that one decision point serves you for all future decisions based on that data.
I mean hey, if you're going to be writing in an object oriented language, you probably ought to be using it.
John
Much of code is a matter of taste:
if(x==1){bla();}
if(x==1){
bla();
}
if ( x==1 ){
bla();
}
if(x==1)
{
bla();
}
Are taste. Comments and variable/function names are functional and thus more arguable (but still generally religious). Any review of this sort that talks about code formatting is wasting our time(unless they went way overboard with something stupid) with religious nonsense so I wish he would stick to the benefits of how they pass parameters etc.
The only time anyone ever "Wins" the code formatting argument is when something else is brought into the argument such as "Format it my way or get fired." or "Format it my way or I quit"
Nearly every place I worked had someone who always began the argument about coding standards with "I don't care which standard we use as long as we all stick to it." But then they relentlessly argued for their standards and wouldn't give an inch with well structured arguments for every space, comma, and return. Often these standards had all kinds of specific metrics like a certain ratio of comments to lines of code. This way they could point to other people's code and mathematically prove that they sucked. Although the worst were the passive aggressive sorts who would reformat any block of code they touched on to "their" standard which was wildly different from the entire rest of the programming team.