Debugging
Debugging explains the fundamentals of finding and fixing bugs (once a bug has been detected), rather than any particular technology. It's best for developers who are novices or who are only moderately experienced, but even old pros will find helpful reminders of things they know they should do but forget in the rush of the moment. This book will help you fix those inevitable bugs, particularly if you're not a pro at debugging. It's hard to bottle experience; this book does a good job. This is a book I expect to find useful many, many, years from now.
The entire book revolves around the "nine rules." After the typical introduction and list of the rules, there's one chapter for each rule. Each of these chapters describes the rule, explains why it's a rule, and includes several "sub-rules" that explain how to apply the rule. Most importantly, there are lots of "war stories" that are both fun to read and good illustrations of how to put the rule into practice.
Since the whole book revolves around the nine rules, it might help to understand the book by skimming the rules and their sub-rules:
- Understand the system: Read the manual, read everything in depth, know the fundamentals, know the road map, understand your tools, and look up the details.
- Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen, and never throw away a debugging tool.
- Quit thinking and look (get data first, don't just do complicated repairs based on guessing): See the failure, see the details, build instrumentation in, add instrumentation on, don't be afraid to dive in, watch out for Heisenberg, and guess only to focus the search.
- Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.
- Change one thing at a time: Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.
- Keep an audit trail: Write down what you did in what order and what happened as a result, understand that any detail could be the important one, correlate events, understand that audit trails for design are also good for testing, and write it down!
- Check the plug: Question your assumptions, start at the beginning, and test the tool.
- Get a fresh view: Ask for fresh insights, tap expertise, listen to the voice of experience, know that help is all around you, don't be proud, report symptoms (not theories), and realize that you don't have to be sure.
- If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process.
This list by itself looks dry, but the detailed explanations and war stories make the entire book come alive. Many of the war stories jump deeply into technical details; some might find the details overwhelming, but I found that they were excellent in helping the principles come alive in a practical way. Many war stories were about obsolete technology, but since the principle is the point that isn't a problem. Not all the war stories are about computing; there's a funny story involving house wiring, for example. But if you don't know anything about computer hardware and software, you won't be able to follow many of the examples.
After detailed explanations of the rules, the rest of the book has a single story showing all the rules in action, a set of "easy exercises for the reader," tips for help desks, and closing remarks.
There are lots of good points here. One that particularly stands out is "quit thinking and look." Too many try to "fix" things based on a guess instead of gathering and observing data to prove or disprove a hypothesis. Another principle that stands out is "if you didn't fix it, it ain't fixed;" there are several vendors I'd like to give that advice to. The whole "stimulate the failure, don't simulate the failure" discussion is not as clearly explained as most of the book, but it's a valid point worth understanding.
I particularly appreciated Agans' discussions on intermittent problems (particularly in "Make it Fail"). Intermittent problems are usually the hardest to deal with, and the author gives straightforward advice on how to deal with them. One odd thing is that although he mentions Heisenberg, he never mentions the term "Heisenbug," a common jargon term in software development (a Heisenbug is a bug that disappears or alters its behavior when one attempts to probe or isolate it). At least a note would've been appropriate.
The back cover includes a number of endorsements, including one from somebody named Rob Malda. But don't worry, the book's good anyway :-).
It's important to note that this is a book on fundamentals, and different than most other books related to debugging. There are many other books on debugging, such as Richard Stallman et al's Debugging with GDB: The GNU Source-Level Debugger. But these other texts usually concentrate primarily on a specific technology and/or on explaining tool commands. A few (like Norman Matloff's guide to faster, less-frustrating debugging ) have a few more general suggestions on debugging, but are nothing like Agans' book. There are many books on testing, like Boris Beizer's Software Testing Techniques, but they tend to emphasize how to create tests to detect bugs, and less on how to fix a bug once it's been detected. Agans' book concentrates on the big picture on debugging; these other books are complementary to it.
Debugging has an accompanying website at debuggingrules.com, where you can find various little extras and links to related information. In particular, the website has an amusing poster of the nine rules you can download and print.
No book's perfect, so here are my gripes and wishes:
- The sub-rules are really important for understanding the rules, but there's no "master list" in the book or website that shows all the rules and sub-rules on one page. The end of the chapter about a given rule summarizes the sub-rules for that one rule, but it'd sure be easier to have them all in one place. So, print out the list of sub-rules above after you've read the book.
- The book left me wishing for more detailed suggestions about specific common technology. This is probably unfair, since the author is trying to give timeless advice rather than a "how to use tool X" tutorial. But it'd be very useful to give good general advice, specific suggestions, and examples of what approaches to take for common types of tools (like symbolic debuggers, digital logic probes, etc.), specific widely-used tools (like ddd on gdb), and common problems. Even after the specific tools are gone, such advice can help you use later ones. A little of this is hinted at in the "know your tools" section, but I'd like to have seen much more of it. Vendors often crow about what their tools can do, but rarely explain their weaknesses or how to apply them in a broader context.
- There's probably a need for another book that takes the same rules, but broadens them to solving arbitrary problems. Frankly, the rules apply to many situations beyond computing, but the war stories are far too technical for the non-computer person to understand.
But as you can tell, I think this is a great book. In some sense, what it says is "obvious," but it's only obvious as all fundamentals are obvious. Many sports teams know the fundamentals, but fail to consistently apply them - and fail because of it. Novices need to learn the fundamentals, and pros need occasional reminders of them; this book is a good way to learn or be reminded of them. Get this book.
If you like this review, feel free to see Wheeler's home page, including his book on developing secure programs and his paper on quantitative analysis of open source software / Free Software. You can purchase Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
cause when i do it, it is often re-bugging
What if someone else fixes it?
> Change one thing at a time: Isolate the
> key factor, grab the brass bar with both
> hands (understand what's wrong before fixing),
> change one test at a time, compare it with a
> good one, and determine what you changed
> since the last time it worked.
This is helpful with unit tests, too. If I find a bug, I want to figure out which unit test should have caught this and why it didn't. Then I can either fix the current tests, or add new ones to catch this.
Either way, if someone reintroduces that particular bug it'll get caught by the unit tests during the next hourly build.
The Army reading list
...are always the worst: bugs which disappear when you look for them. Insert a print statement? The bug disappears. Use a debugger? The bug reappears, but in a different place.
Heisenbugs are almost always caused by buffer overflows. They can often be prevented (at least in Fortran 77/90/95/03) by enabling array-bounds checking at compile time; but before I knew about this, I had a hell of a time tracking them down.
Tubal-Cain smokes the white owl.
I've read it and it's a good book, but I would
just borrow it from the library and then print
out the poster to remember the 'rules'.
There's not enough meat to keep it on my
precious shelf space.
-- All that's left of me, is slight insanity, whats on the right, I don't know. -- Bob Mould
Regression test suites (if possible) should be maintained so that when bugs get fixed, they stay fixed.
Just my 2 cents.
I can think of a WHOLE lot of tech's and admin's who really need to follow number 9 a lot closer.
Especially those Windows admins/techs who think 'restart' is the ultimate fix-all. Though, sadly, I suppose in many cases that's about all you can do with proprietary software. Well, that and beg vendors to fix the problem. (We all know how productive that is....)
- Is it plugged in?
- Are you logged in?
- Is it spelled right?
Works in 9 cases out of 10.Quidquid latine dictum sit, altum sonatur.
"The most likely source of the current bug is the fix you made to the last one."
One thing's clear from looking at that list - spend more time on testing your code.
Unfortunately, speaking as an ex-programmer, time is one luxury that PHBs don't afford their minions. A project needs to be completed and knocked out of the door as soon as possible. The less time spent on unnecessary work, the better.
It is also unfortunate that PC users have been brought up expecting to have buggy software in front of them and expecting to have to reboot/reinstall. What motivation is there to produce bug free code when the users will accept buggy code?
Ho well, at least I run my own company now - master of my own wallet - and can concentrate on quality solutions.
0. If you're a software guy blame it on hardware, if you're a hardware guy blame it on software.
0.1. Blame it on the user.
0.2. Blame it on your colleague.
0.3. Blame it on your manager.
0.4. Yell at the computer and tell it to work dammit!
0.5. Put head on keyboard and sob.
0.6. Read Slashdot.
0.7. Post on Slashdot.
0.8. Call it a feature not a bug.
You can read a sample chapter from the Debugging Rules book in PDF format by going here. (Requires the free Adobe reader.)
The Big News Page
Troubleshooting is what you do to fix your mom's ethernet card. "Oooh, it's on the bottom PCI slot, has no interrupt line. I'll just move it up one slot..."
Debugging is what you do with an oscilloscope to figure out why a particular circuit design isn't working as anticipated. You don't "troubleshoot" a circuit design. You debug it.
Or, to put it another way, "troubleshooting" is what a tech support monkey does. "Debugging" is what an engineer does.
10. Code is _always_ Beta. It's never done until it's no longer in use or support no longer exists.
9. The better the SDK, the more sophisticated the bugs.
8. There's always more bugs in the other guy's (girl's) code.
7. Declaring code bug-free is asking for it to fail at the worst possible time with the greatest visibility.
6. A good design is as likely to have bugs as a bad one. Bugs are equal opportunity.
5. Debugging time is inversely proportional to coding time.
4. If it works the first time, there's a bug, but you won't find it until you roll it out.
3. Debugging is fun. Really! It's when you run out of bugs that you should wonder if you got them all, that's not fun.
2. The most difficult bugs to find are in the most straightforward looking code.
1. That's not a bug, that's a feature.
A feeling of having made the same mistake before: Deja Foobar
I find the best way to uncover bugs is to do a demo for your boss's boss.
If all this should have a reason, we would be the last to know.
They missed a good one: explain the bug to someone.
If you start explaining the bug to someone, there's a good chance in mid-explanation you'll realize a solution to the problem.
Some school (can't remember which) had a Teddy Bear in their programming consulting office... There was a sign. "Explain it to the bear first, before you talk to a human". Silly as it sounds, people would do it, and a large portion of the time they'd never actually have to consult the staff... by explaining it to the bear, they solved the problem.
Weird, but true.
One rule he's missed is very important: Before making a measurement (like printing the value of a variable or changing something about the code) work out what answer you expect to see. Note well - do this before you look at the result. When you see something different, either its a symptom of the bug, or a symptom of you not yet understanding the system. Resolving this will either improve your understanding or turn up the problem.
Squirrel!
4. Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.
That's a very usueful rule. In nearly 20 years of programming I haven't found any tool or technique that works better than printf / std::cout / MessageBox and logging.
Logging is especially important if your users aren't conveniently in the same building as you. When a customer has a problem I've never seen before, I usually tell them to run the program with the -log switch and send me the log. Nearly always this leads to the problem and I can fix the bug within minutes.
Add logging to your app and you'll increase the number of hours you can sleep.
Well, even though I think most people 'round these parts would agree with me that the book covers the fairly obvious, I will say this: it's absolutely necessary to have an "expert" write these things down because all too often, us developers try to proceed and get blocked by management. At my last job, we had a big problem with WebLogic transaction management, some bizarre confluence of events was causing a HeuristicMixedException to be thrown by the platform--by the way, WebLogic people, thanks a lot for naming this exception this way and taking the time to make sure it gets thrown in no less than six totally unrelated (as far as I can tell) circumstances. I love it when exceptions originate ambiguously, from several sources, and no one part of the platform has authority over the problem.
This was a big enough problem that we had to set up a separate, isolated environment to figure out what was going on. 4 out of the 5 architects involved on the project (no it wasn't a huge project--you can see HME wasn't the only problem here) had cemented ideas about what was going wrong...none of them agreed of course...and we had no less than 3 managers with theories based on the idea that the Earth sweeps through an aether against which all things can be measured.
The biggest issue with this testing environment was keeping everyone's mitts off of it, especially those people who didn't have to ask for permissions to the system (the architects, managers...in other words everyone). And the managers didn't agree that it was particularly important to record every step methodically, or limit the number of people making changes to the system to 1 at a time. Instead, they set up a war room and engaged in what I like to call: Fix By Chaotic Typing. (It's chaotic in the sense that, there are definitely patterns to the activity taking place, but you have to be Stephen Wolfram to find and understand them.)
Needless to say, that didn't work. If I'd had access to this book, an authority willing to put the obvious in print might have bolstered my argument that we needed to take resources OFF this issue, not add more. Alas, it was not to be. The bigwigs decided that, since the current manpower wasn't able to track down this bug, it was time to bring in the high-priced WebLogic consultants. We got some 1st generation WebLogic people, 3 of them eventually, and they came in and immediately set themselves to the task of learning our business, telecommunications. And at a mere $150/hour, why not? (Management decided the bug was non-deterministic at some point and this assembly of people was given the informal team moniker: the Heuristics team. I preferred "the Histrionics team".)
So I eventually teamed up with the lead architect on the project and we solved the problem by subterfuge. We had to intentionally set these people working in a direction--everyone, employees and WebLogic consultants alike--that was so off-the-track they actually didn't interfere with any part of the system likely containing the error. This gave us a reasonable amount of time and space to track down the bug in 3 days' time. At only the loss of 6 weeks and several thousand dollars in expenses alone for the WL consultants.
sev
but have you considered the following argument: shut up.
My favorite quote on the subject of debugging:
55 years later, programmers are still spending a large part of their lives finding bugs and fixing them...