Debugging

← Back to Stories (view on slashdot.org)

Posted by timothy on Tuesday February 24, 2004 @07:41AM from the unlousy dept.

dwheeler writes "It's not often you find a classic, but I think I've found a new classic for software and computer hardware developers. It's David J. Agan's Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems." Read on for the rest. Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems author David J. Agans pages 192 publisher Amacom rating 9 reviewer David A. Wheeler ISBN 0814471684 summary A classic book on debugging principles

Debugging explains the fundamentals of finding and fixing bugs (once a bug has been detected), rather than any particular technology. It's best for developers who are novices or who are only moderately experienced, but even old pros will find helpful reminders of things they know they should do but forget in the rush of the moment. This book will help you fix those inevitable bugs, particularly if you're not a pro at debugging. It's hard to bottle experience; this book does a good job. This is a book I expect to find useful many, many, years from now.

The entire book revolves around the "nine rules." After the typical introduction and list of the rules, there's one chapter for each rule. Each of these chapters describes the rule, explains why it's a rule, and includes several "sub-rules" that explain how to apply the rule. Most importantly, there are lots of "war stories" that are both fun to read and good illustrations of how to put the rule into practice.

Since the whole book revolves around the nine rules, it might help to understand the book by skimming the rules and their sub-rules:

Understand the system: Read the manual, read everything in depth, know the fundamentals, know the road map, understand your tools, and look up the details.
Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen, and never throw away a debugging tool.
Quit thinking and look (get data first, don't just do complicated repairs based on guessing): See the failure, see the details, build instrumentation in, add instrumentation on, don't be afraid to dive in, watch out for Heisenberg, and guess only to focus the search.
Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.
Change one thing at a time: Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.
Keep an audit trail: Write down what you did in what order and what happened as a result, understand that any detail could be the important one, correlate events, understand that audit trails for design are also good for testing, and write it down!
Check the plug: Question your assumptions, start at the beginning, and test the tool.
Get a fresh view: Ask for fresh insights, tap expertise, listen to the voice of experience, know that help is all around you, don't be proud, report symptoms (not theories), and realize that you don't have to be sure.
If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process.

This list by itself looks dry, but the detailed explanations and war stories make the entire book come alive. Many of the war stories jump deeply into technical details; some might find the details overwhelming, but I found that they were excellent in helping the principles come alive in a practical way. Many war stories were about obsolete technology, but since the principle is the point that isn't a problem. Not all the war stories are about computing; there's a funny story involving house wiring, for example. But if you don't know anything about computer hardware and software, you won't be able to follow many of the examples.

After detailed explanations of the rules, the rest of the book has a single story showing all the rules in action, a set of "easy exercises for the reader," tips for help desks, and closing remarks.

There are lots of good points here. One that particularly stands out is "quit thinking and look." Too many try to "fix" things based on a guess instead of gathering and observing data to prove or disprove a hypothesis. Another principle that stands out is "if you didn't fix it, it ain't fixed;" there are several vendors I'd like to give that advice to. The whole "stimulate the failure, don't simulate the failure" discussion is not as clearly explained as most of the book, but it's a valid point worth understanding.

I particularly appreciated Agans' discussions on intermittent problems (particularly in "Make it Fail"). Intermittent problems are usually the hardest to deal with, and the author gives straightforward advice on how to deal with them. One odd thing is that although he mentions Heisenberg, he never mentions the term "Heisenbug," a common jargon term in software development (a Heisenbug is a bug that disappears or alters its behavior when one attempts to probe or isolate it). At least a note would've been appropriate.

The back cover includes a number of endorsements, including one from somebody named Rob Malda. But don't worry, the book's good anyway :-).

It's important to note that this is a book on fundamentals, and different than most other books related to debugging. There are many other books on debugging, such as Richard Stallman et al's Debugging with GDB: The GNU Source-Level Debugger. But these other texts usually concentrate primarily on a specific technology and/or on explaining tool commands. A few (like Norman Matloff's guide to faster, less-frustrating debugging ) have a few more general suggestions on debugging, but are nothing like Agans' book. There are many books on testing, like Boris Beizer's Software Testing Techniques, but they tend to emphasize how to create tests to detect bugs, and less on how to fix a bug once it's been detected. Agans' book concentrates on the big picture on debugging; these other books are complementary to it.

Debugging has an accompanying website at debuggingrules.com, where you can find various little extras and links to related information. In particular, the website has an amusing poster of the nine rules you can download and print.

No book's perfect, so here are my gripes and wishes:

The sub-rules are really important for understanding the rules, but there's no "master list" in the book or website that shows all the rules and sub-rules on one page. The end of the chapter about a given rule summarizes the sub-rules for that one rule, but it'd sure be easier to have them all in one place. So, print out the list of sub-rules above after you've read the book.
The book left me wishing for more detailed suggestions about specific common technology. This is probably unfair, since the author is trying to give timeless advice rather than a "how to use tool X" tutorial. But it'd be very useful to give good general advice, specific suggestions, and examples of what approaches to take for common types of tools (like symbolic debuggers, digital logic probes, etc.), specific widely-used tools (like ddd on gdb), and common problems. Even after the specific tools are gone, such advice can help you use later ones. A little of this is hinted at in the "know your tools" section, but I'd like to have seen much more of it. Vendors often crow about what their tools can do, but rarely explain their weaknesses or how to apply them in a broader context.
There's probably a need for another book that takes the same rules, but broadens them to solving arbitrary problems. Frankly, the rules apply to many situations beyond computing, but the war stories are far too technical for the non-computer person to understand.

But as you can tell, I think this is a great book. In some sense, what it says is "obvious," but it's only obvious as all fundamentals are obvious. Many sports teams know the fundamentals, but fail to consistently apply them - and fail because of it. Novices need to learn the fundamentals, and pros need occasional reminders of them; this book is a good way to learn or be reminded of them. Get this book.

If you like this review, feel free to see Wheeler's home page, including his book on developing secure programs and his paper on quantitative analysis of open source software / Free Software. You can purchase Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

290 comments

Min score:

Reason:

Sort:

i hate debugging by Anonymous Coward · 2004-02-24 07:42 · Score: 5, Funny

cause when i do it, it is often re-bugging
1. Re:i hate debugging by Frymaster · 2004-02-24 08:33 · Score: 5, Funny
  cause when i do it, it is often re-bugging
  we have a special process we call "debuggery". debuggery - maxims and arrows
  
  be hostile: your application was your friend - your baby. you gave it life. well, no longer. now your application is your enemy. do you admire the intricate house of cards you have built like hiram abif? don't. you have a glue gun now and you are going to do a little explaining about who is boss here! your app is taunting you - it's thinking "what does a chemical/analogue hack like that have that i don't?" well, i'll tell you: an index finger. suitable for hitting the "del" key. make this crystal goddamn clear!
  
  kludge everything! the debug stage of the development life cycle is all about kludges. we call it klop - kludge-oriented programming:
  kludge foo = new kludge(specialCase bar);
  you've written that. the debugging phase comes at the end of a project. ie the part closest to the deadline when clueless suits and moneyment confuse line count with product. the pressure is on. the company is on the line. are you going to walk into the glass tower and pitch to the vc's about how yr going to have to go back to the uml's and rebuild x? good luck! can i have your job when you're done? get the tape, get the staples, get the glue.
  
  blame others: teamwork is just a code word for being the shepherd to a flock of scapegoats. if you were smart, you'd have been working on cultivating a culture of accepting blame early on in the cycle. this is espescially effective if yr building a client/server thingy. establish early on that most of the failures are on the client(server) side. whichever one you're not writing.
  make yourself documentation czar if possible - then abuse the position to retroactively assign blame to other team members ("the docs explicitly state that we use roman numerals" - "gee, i don't remember that" - "well tough. get coding").if you set it up right you can build an army of debugging minions to do your kluding for you while you, uh, read slashdot...
  
  redefine feature sets. the client is a clueless little doughboy who can't tell his ass from his operating system anyway. he's been flaking you on the spec-n-req all year. turn those tables! if a feature is buggy, yank it. if there's a complaint, reference the client to some vaguely-related advisory somewhere (trust me, he won't read all the way down). if he complains say "in light of advisory x we strongly adivse against implementing _______ (feature). a work around may be possible at a future point and we are more than willing to calculate the billing for that additional work now."
  
  all that and echo will solve all yr debuggery problems.
  --
  2 1337 4 u!
2. Re:i hate debugging by easter1916 · 2004-02-24 09:49 · Score: 1
  
  I'm not British (Irish, so our slang is very similar), but punter basically means customer.
3. Re:i hate debugging by aled · 2004-02-24 10:33 · Score: 2, Funny
  
  I like your methodology and I would apply it in the case I do any debugging. I don't need it right now, because I didn't found any bugs to fix. It may be related to not doing any testing. Testing is dangerous, something may break. The only maxim I need is "if it compiles, it works". Modern compilers check your program, if I would make some silly error it just catches at sight. And static type checking guarantees the program works right allways. I never look the code again once it compiles. Isn't programming wonderful?
  I have to leave now, I have to deliver more quality software.
  
  --
  
  "I think this line is mostly filler"
4. Re:i hate debugging by isomeme · 2004-02-24 10:38 · Score: 1
  
  like hiram abif?
  
  A new record-setting entry on my "Similes I never expected to see on /." list.
  
  --
  When all you have is a hammer, everything looks like a skull.
#9 is wrong by Anonymous Coward · 2004-02-24 07:46 · Score: 5, Funny

What if someone else fixes it?
1. Re:#9 is wrong by Neil+Blender · 2004-02-24 07:52 · Score: 1
  
  What if someone else fixes it?
  
  Rule 0: Use a bug tracking system and assign yourself the bug before starting anything.
2. Re:#9 is wrong by kubrick · 2004-02-24 10:38 · Score: 1
  
  Rule #9a: Trust no-one.
  
  Does that imply a #9b) The truth is out there?
  
  --
  deus does not exist but if he does
yuck by theMerovingian · 2004-02-24 07:47 · Score: 4, Funny

Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen

Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.

Does anyone else feel dirty after reading this?

--
"If you think you have things under control, you're not going fast enough." --Mario Andretti
1. Re:yuck by kooso · 2004-02-24 07:56 · Score: 2, Insightful
  
  Not me. It would be interesting to have a rule of thumb for the real economic cost of debugging this way.
  
  We all (except Dijstra, perhaps) take trade-offs, for a reason. Perhaps that reason is only ignorance, but then we wouldn't get anything done.
2. Re:yuck by Anonymous Coward · 2004-02-24 09:08 · Score: 0
  
  Wow - it's funny how far above your head that joke went... Let me guess - virgin?
3. Re:yuck by Ed+Avis · 2004-02-24 09:32 · Score: 1
  
  I find delta can often be useful for automatically narrowing down a large test case to a small one.
  
  And BTW, it's 'Dijkstra'.
  
  --
  -- Ed Avis ed@membled.com
4. Re:yuck by Tony-A · 2004-02-24 11:48 · Score: 1
  
  Does anyone else feel dirty after reading this?
  
  Messing with bugs and you expect to feel clean ???
Change one thing at a time by tcopeland · 2004-02-24 07:47 · Score: 5, Insightful

> Change one thing at a time: Isolate the
> key factor, grab the brass bar with both
> hands (understand what's wrong before fixing),
> change one test at a time, compare it with a
> good one, and determine what you changed
> since the last time it worked.

This is helpful with unit tests, too. If I find a bug, I want to figure out which unit test should have caught this and why it didn't. Then I can either fix the current tests, or add new ones to catch this.

Either way, if someone reintroduces that particular bug it'll get caught by the unit tests during the next hourly build.

--
The Army reading list
1. Re:Change one thing at a time by wrp103 · 2004-02-24 08:38 · Score: 5, Interesting
  It is nice to see a book that addresses this topic. I get very frustrated with so many text books that have at most a small chapter on debugging. Let's face it, beginning programmers spend more time debugging code than they do writing code, so why isn't that activity stressed?
  
  I particularly liked the rule about "Quit thinking and look". I worked with a guy who used what I call the "Zen method of debugging". He would keep staring at the code, trying to determine what was going on. I, on the other hand, would throw in some print statements so I could see what was going on. In one case, he insisted there was nothing wrong with the code, but what he didn't realize was that an early test failed, which meant the code he was looking at never got executed. I had suggested he print something out at the start of the routine, but he insisted it wasn't necessary because he knew what it was doing.
  He might cover this in the book, but one rule that I stress with my students is, if you make a change and the behavior of the program is the same, back out your changes because either:
  
  You are probably looking in the wrong place (which is why the behavior is the same)
  
  You could easily have just inserted several new bugs that you won't see until the path you are looking at gets executed.
  
  I often have students insist that their changes should have fixed something, but it turns out the program was actually executing an alternative path that they weren't looking at, or that the problem was much earlier, so when it got to where they thought the problem was, the data was different than they assumed.
2. Re:Change one thing at a time by CargoCultCoder · 2004-02-24 10:01 · Score: 4, Interesting
  
  I particularly liked the rule about "Quit thinking and look". I worked with a guy who used what I call the "Zen method of debugging". He would keep staring at the code, trying to determine what was going on...
  
  Personally, I would consider this to be the anti-Zen method. He was apparently focused so much on what he "knew" to be true, that he failed to consider clues trying to point him in another direction. That is not the Zen way of looking at things.
  
  Zen and the Art of Motorcycle Maintenance has a lot to say about this. If you're stuck on a problem, the solution is not to beat on it harder (e.g., stare at the code some more). The solution is to back off, and (to paraphrase from memory) allow yourself to become aware of the one little fact that's out there, waving it's hand, hoping that you might notice it ... and that'll point you at the real problem.
  
  Stupidly staring at code is not Zen. Having an open mind for interesting and helpful facts -- whatever their source -- is.
3. Re:Change one thing at a time by Anonymous Coward · 2004-02-24 11:20 · Score: 0
  
  Hourly builds, ho ho ho!! That's a good one. Guess your product must be really small or you are lucky enough to work for a company that actually gives developers huge farms of servers to do parallel compiles, just to be nice. :-p. Oh and I guess you don't have to test on 20 different types of target board, all against each other over a network.
  
  No, I'm not bitter or anything...
4. Re:Change one thing at a time by e-Motion · 2004-02-24 11:41 · Score: 3, Insightful
  
  I particularly liked the rule about "Quit thinking and look". I worked with a guy who used what I call the "Zen method of debugging". He would keep staring at the code, trying to determine what was going on. I, on the other hand, would throw in some print statements so I could see what was going on.
  
  Sometimes reading the code is enough. If you're good at reading code, then sometimes all you have to do is briefly look over what you wrote to spot the bug. YMMV, of course. If you've looked at the code for a few minutes and nothing looks obviously wrong, then it's probably time to use the debugger/add print statements. I've found that this is the most efficient way to go bug-hunting, because a quick re-read can find a lot of the "easy" bugs. This is similar to having a code review, but in this case you, the author, are the only reviewer. If there is another coder nearby, go ahead and ask him/her to give it a quick look as well, because he/she will probably have an easier time spotting the error.
  
  I've met people who skip this step, and it drives me up the wall to see them waste their time (sometimes hours!) poking around in a debugger/writing print statements when the code they are debugging is simple. If it's a small, straightforward bit of code, then a quick look should uncover the bug. I suppose this falls under rule #1 (understand the system), but my point is more specific: understand the code.
  
  None of the above is particularly groundbreaking, of course, and probably doesn't deserve to be mentioned in the book. These are more like "things you do before debugging".
5. Re:Change one thing at a time by Anonymous Coward · 2004-02-24 12:01 · Score: 0
  You forgot option 3,
  
  You fixed a different bug.
6. Re:Change one thing at a time by rark · 2004-02-25 01:33 · Score: 1
  
  It all depends on how you're staring.
  
  If you are conciously going over the code you see, then yes, that's not very zen like.
  
  But in troubleshooting I find myself staring at logs or whatnot in a zen-like manner. The stare is different -- like 'soft gaze' or 'eagle gaze' -- I'm not focusing on the words, but seeing the whole, and sometimes that is enough to knock the answer out of my head and into my fingers. It's a lot like going away from a problem, except that there is a small amount of focusing stimulus to remind me that I'm still trying to solve it. It is backing away from the problem.
  
  What the original posters' former coworker was doing, I cannot comment on, because I wasn't there, but it is possible he was doing what I (and others) do.
  
  BTW, _Zen_and_the_Art_of_Motorcycle_Maintainance_ is an excellent book and I highly recommend it. Though I came out of it feeling more than a little uncomfortable with my similarities to Phaedrus. I'm mostly over that now ;)
7. Re:Change one thing at a time by tcopeland · 2004-02-25 03:49 · Score: 1
  
  > I, on the other hand, would throw in
  > some print statements so I could see
  > what was going on
  
  That's the beauty of unit tests - they're the equivalent of print statements, but the computer checks the output. Sweet.
  
  --
  The Army reading list
Heisenbugs... by Aardpig · 2004-02-24 07:48 · Score: 5, Informative

...are always the worst: bugs which disappear when you look for them. Insert a print statement? The bug disappears. Use a debugger? The bug reappears, but in a different place.

Heisenbugs are almost always caused by buffer overflows. They can often be prevented (at least in Fortran 77/90/95/03) by enabling array-bounds checking at compile time; but before I knew about this, I had a hell of a time tracking them down.

--
Tubal-Cain smokes the white owl.
1. Re:Heisenbugs... by AndroidCat · 2004-02-24 07:56 · Score: 5, Funny
  
  When I was working on arcade games, we had a sure-fire method of making bugs go away. However, shipping each coin-op game with an engineer and $40k worth of testing equipment connected to it wasn't really cost-effective.
  
  --
  One line blog. I hear that they're called Twitters now.
2. Re: Heisenbugs... by gidds · 2004-02-24 08:01 · Score: 3, Interesting
  
  You're describing bugs which are reproducible, but only on the unchanged code.
  Worse even that those are bugs which aren't reproducible at all, where there's no way to determine the conditions that caused them, or be sure you've fixed them. The only way to handle them is to fill the code with assertions and defensive code, and hope that at some point it'll catch something for you...
  
  --
  Ceterum censeo subscriptionem esse delendam.
3. Re:Heisenbugs... by WayneConrad · 2004-02-24 08:04 · Score: 4, Insightful
  
  Heisenbugs are almost always caused by buffer overflows.
  
  They are also almost always caused by race conditions, the most insidious of which is thread-safe code that turns out only to be safe on a uniprocessor system.
  
  And don't forget the phase of the moon, or for the truly unlucky, intermittently glitchy hardware.
4. Re:Heisenbugs... by Anonymous Coward · 2004-02-24 08:04 · Score: 0
  
  Yeah it's weird isn't it? In my operating system class my groups' program caused an error at one of the delete[] statements and it dissappeared and reappeared depending on whether we ran it in the debug environment or not.
5. Re:Heisenbugs... by kzinti · 2004-02-24 08:09 · Score: 4, Insightful
  
  Heisenbugs are almost always caused by buffer overflows.
  
  In my experience, Heisenbugs are almost always caused by stack problems. That's why they go away when you put print statements in the code - because you're causing the usage of the stack to change.
  
  Buffer overflows (to arrays on the stack) are one good way to munge the stack. Returning the address of an input parameter or automatic variable is another way, because these are declared on the stack and cease to exist when the enclosing block exits. Anybody else using such an address is writing into the stack in an undefined manner, and chaos can result!
6. Re:Heisenbugs... by pclminion · 2004-02-24 08:11 · Score: 3, Interesting
  
  In my operating system class my groups' program caused an error at one of the delete[] statements and it dissappeared and reappeared depending on whether we ran it in the debug environment or not.
  I'll tell you with 99% certainty that this was caused by a piece of code overrunning the end (or beginning) of a new[]'d buffer, clobbering the memory allocation meta-data. This causes delete[] to crump when it hits a bogus pointer and flies off into never never land.
  By running in the debug environment you changed the memory layout of the allocation in such a way that the problem was masked.
  These kinds of bugs only seem weird the first time you encounter them. They're actually some of the most common types of bugs. With enough experience you'll be finding them in your sleep.
7. Re:Heisenbugs... by morcheeba · 2004-02-24 08:13 · Score: 2, Interesting
  
  that's funny... I just tracked one of these down that existed in our software - the optimized version ran differently than the non-optimized version. It turns out the bounds checker is in the non-optimized version, and a couple of places in the code used x[rand()]=y ... the bounds-checker (implemented as a macro which had side effects) *caused* the heisenbug!
  
  --
  HIV Crosses Species Barrier... into Muppets
8. Re:Heisenbugs... by JWW · 2004-02-24 08:26 · Score: 2, Informative
  
  Wow, I didn't even know there WAS a Fortran 03.
  
  I've only ever used 77, but I knew 90 existed, I just thought it must've died off after that (I haven't used Fortran in almost 15 years).
9. Re:Heisenbugs... by the_archivist · 2004-02-24 08:28 · Score: 0
  
  Not allways stack.
  When i was writing C on the Acorn Risc PC a large switch statement failed because the processor could not jump that far (about 10 statements, less if any code was in each statement).
  It was the worst C "compiler" in history IMO.
  I wonder why they went down the toilet.
  
  --
  while(karma less_than enough_karma){karma++}
10. Re:Heisenbugs... by Aardpig · 2004-02-24 08:37 · Score: 2, Informative
  
  Wow, I didn't even know there WAS a Fortran 03.
  
  Strictly speaking, there isn't -- yet. It is currently in draft form, and will be formally released later on this year. Fortran itself is still being used extensively for numerical modelling, since it remains the leader performance-wise for such problems.
  
  --
  Tubal-Cain smokes the white owl.
11. Re:Heisenbugs... by Marvin_OScribbley · 2004-02-24 08:51 · Score: 3, Informative
  
  Heisenbugs are almost always caused by buffer overflows.
  
  In my experience with embedded systems, a Heisenbug is almost always caused by un-initialized data. You wind up assuming a particular value whereas you originally didn't plan on doing that. What value the data actually turns out to be is highly dependant on things like where in memory the code loads, how big the executable is, and so forth. Adding debugging statements will shift all the code after it up in memory and often make the bug go away and behave differently.
  
  Another interesting bug that is unrelated to the Heisenbug is when you port (for example) ANSI C code from one platform to another and code that originally worked starts doing weird things. For example, the C compiler under a BSD would allow modulo 0 and produce a zero result, which was incidentally what was wanted. Moved the code to Linux and started getting core dumps, because modulo 0 was considered dividing by zero. Some problems like this actually turn out to be Heisenbugs, for example due to differences in the way memory is malloc-ed on different systems. For example, suppose you accidentally malloc a pointer rather then its contents. One one OS you wind up allocating more memory than you need, but have no problems because addresses start fairly low in memory. On another OS memory addresses start somewhere else and you start getting weird errors due to lack of memory.
  
  --
  I'm not a journalist, but I play one on slashdot
12. Re:Heisenbugs... by badmammajamma · 2004-02-24 08:57 · Score: 2, Interesting
  
  In my experience they are usually caused by inadequate protection of data that needs to be thread safe. In fact, I've never had one from a buffer overflow.
  
  Of course this brings up the point of don't make assumptions that "this" has to be the problem. The bugs that take the longest time to debug are the ones where we build a false premise in our head of where the problem is or what the state of things are up to the point of the problem.
  
  Assume nothing.
  
  --
  Any man who afflicts the human race with ideas must be prepared to see them misunderstood. -- H. L. Mencken
13. Re:Heisenbugs... by jimsum · 2004-02-24 08:58 · Score: 2, Interesting
  
  Heisenbugs are also caused by floppy disks. We once shipped a program where one bit was wrong on the floppy, which caused a nasty bug. That one was hard to duplicate until we got the customer to ship us their version of the program.
  
  --
  -- Pot is safer than Beer
14. Re:Heisenbugs... by pommiekiwifruit · 2004-02-24 08:58 · Score: 1
  
  I have also had Heisenbugs which are due to code/data alignment issues. For example if two items of data were in the same 256 byte page the code worked but if they were in different pages they didn't. This was affected by the difference between release and debug builds. Also, on the 6502, a JMP indirect instruction would fail if the data straddled a page boundary. And with a compiler I am using now, I am not completely convinced that its code is completely page-safe for calling function pointers (there might be a 1 in 65536 chance of the code being "unfortunate"). In these situations, use the disassembler!
  On fancy-pants PC systems you might also find that threading/synchronisation is an important issue - the system i/o routines might cause a thread yield which prevents certain bugs from showing.
15. Re: Heisenbugs... by pclminion · 2004-02-24 09:01 · Score: 1
  
  Worse even that those are bugs which aren't reproducible at all, where there's no way to determine the conditions that caused them, or be sure you've fixed them.
  That's not so bad. Computers are deterministic. If the same thing happens multiple times, the same result will ensue each time. Hence, the source of the intermittant problem must lie in the nondeterministic influences on the program. It immediately narrows down the possibilities to:
  1. A problem triggered by specific input. Log all input (including places you wouldn't normally expect. Environment variable values count as "input", for example). Find a sequence that bugs out. Replay the input as often as necessary to debug.
  2. A problem triggered by a random variable. If you call rand()-like functions, switch to a fixed seed value. Change the value until you produce the bug. Now go debug the problem.
  3. If neither of the above, it must be a timing issue. Examine your inter- and intra-process synchronization code. The problem lies there.
16. Re:Heisenbugs... by Anonymous Coward · 2004-02-24 09:03 · Score: 0
  
  Or the personal favorite:
  
  Finally tracking the problem down to the fact that we initially compiled gcc without the thread safe options.
17. Re: Heisenbugs... by Detritus · 2004-02-24 09:10 · Score: 1
  
  Computers are deterministic.
  Only in textbooks. The real world is analog and messy.
  
  --
  Mea navis aericumbens anguillis abundat
18. Re:Heisenbugs... by Rufus88 · 2004-02-24 09:13 · Score: 5, Interesting
  
  In my experience, Heisenbugs are often the result of race conditions between concurrent threads.
  
  This reminds me of a famous hardware "bug":
  > This is a weird but true story (with a moral) ...
  > A complaint was received by the Pontiac Division of General Motors:
  >
  > "This is the second time I have written you, and I don't blame you for not
  > answering me, because I kind of sounded crazy, but it is a fact that we
  > have a tradition in our family of ice cream for dessert after dinner each
  > night.
  >
  > But the kind of ice cream varies so, every night, after we've eaten, the
  > whole family votes on which kind of ice cream we should have and I drive
  > down to the store to get it. It's also a fact that I recently purchased a
  > new Pontiac and since then my trips to the store have created a problem.
  >
  > You see, every time I buy vanilla ice cream, when I start back from the
  > store my car won't start. If I get any other kind of ice cream, the car
  > starts just fine. I want you to know I'm serious about this question, no
  > matter how silly it sounds: 'What is there about a Pontiac that makes it
  > not start when I get vanilla ice cream, and easy to start whenever I get any
  > other kind?'"
  >
  > The Pontiac President was understandably skeptical about the letter, but
  > sent an engineer to check it out anyway. The latter was surprised to be
  > greeted by a successful, obviously well educated man in a fine neighborhood.
  >
  > He had arranged to meet the man just after dinner time, so the two hopped
  > into the car and drove to the ice cream store. It was vanilla ice cream
  > that night and, sure enough, after they came back to the car, it wouldn't
  > start.
  >
  > The engineer returned for three more nights. The first night, the man got
  > chocolate. The car started. The second night, he got strawberry. The car
  > started. The third night he ordered vanilla. The car failed to start.
  >
  > Now the engineer, being a logical man, refused to believe that this man's
  > car was allergic to vanilla ice cream. He arranged, therefore, to continue
  > his visits for as long as it took to solve the problem. And toward this end
  > he began to take notes: he jotted down all sorts of data, time of day, type
  > of gas used, time to drive back and forth, etc.
  >
  > In a short time, he had a clue: the man took less time to buy vanilla than
  > any other flavor. Why? The answer was in the layout of the store.
  >
  > Vanilla, being the most popular flavor, was in a separate case at the front
  > of the store for quick pickup. All the other flavors were kept in the back
  > of the store at a different counter where it took considerably longer to
  > find the flavor and get checked out.
  >
  > Now the question for the engineer was why the car wouldn't start when it
  > took less time. Once time became the problem-not the vanilla ice cream-the
  > engineer quickly came up with the answer: vapor lock. It was happening
  > every night, but the extra time taken to get the other flavors allowed the
  > engine to cool down sufficiently to start. When the man got vanilla, the
  > engine was still too hot for the vapor lock to dissipate.
  >
  > Moral of the story: even insane looking problems are sometimes real.
19. Re:Heisenbugs... by thentil · 2004-02-24 09:22 · Score: 1
  
  or for the truly unlucky, intermittently glitchy hardware.
  
  That reminds me of my college days; I had written a program to collect data from the field, with a 'thumper truck' creating the seismic waves. When we looked at the data, we were missing a good portion of it, seemingly random. We blamed the program, wasted countless hours searching for a bug that we couldn't reproduce in the lab. We blamed the late 80's laptop, and found a new one. We did everything *but* blame the serial cable we (being undergrads on an undergrad budget) had soldered together ourselves. Anyhow, turns out the vibrations in the field were causing a bad connection on this shitty serial cable (probably the *first* place we should have looked), so after a couple-hundred man-hours we invested $9 in a real serial cable -- presto, problem solved. Frustrating as hell, but on the upside - the program was bulletproof after all the modifying we did to catch the error.
20. Re:Heisenbugs... by Anonymous Coward · 2004-02-24 09:34 · Score: 0
  
  Heisenbugs are almost always caused by buffer overflows
  
  Or caching issues on certain consumer videogame systems...
21. Re: Heisenbugs... by Alzheimers · 2004-02-24 09:37 · Score: 2, Funny
  
  Hardware: The part of the computer that you can kick.
  Software: The part of the computer that can kick you.
22. Re:Heisenbugs... by Kazoo+the+Clown · 2004-02-24 09:38 · Score: 0, Flamebait
  
  Heisenbugs are almost always caused by buffer overflows. They can often be prevented (at least in Fortran 77/90/95/03) by enabling array-bounds checking at compile time; but before I knew about this, I had a hell of a time tracking them down.
  
  Fortran?-- who is this guy? Last time I worked in Fortran it was on cards... Sigh... I had a crush on the keypunch operator...
  
  Actually, Heisenbugs are often caused by invalid pointers in C-- pointers pointing to memory that has been freed or relocated, pointers pointing to 0 due to an unchecked malloc failure, etc.. Enabling bounds checking can help find these, but not always...
23. Re:Heisenbugs... by jlseagull · 2004-02-24 09:48 · Score: 1
  
  How funny we're talking about this... I just this minute fixed one in the firmware I'm working on. Increased the stack size and changed a function call:
  
  where a is a horrendous function:
  
  unsigned long z()
  {
  unsigned long b;
  b=a(1)+a(2)+a(3);
  (...stuff...)
  return(b);
  }
  
  This returned different values depending on when you called z(). Figured out this had to do with how full the stack was.
  
  Instead, I tried this:
  
  unsigned long z()
  {
  unsigned long a1, a2, a3, b;
  a1=a(1);
  a2=a(2);
  a3=a(3);
  b=a1+a2+a3;
  (...stuff...)
  return b;
  }
  
  I'm a physicist, not a software engineer, so this is probably obvious to all of you - but I felt better.
  
  --
  'Be always mindful, even when ditch-digging.' --D. T. Suzuki
24. Re:Heisenbugs... by NumbThumb · 2004-02-24 09:54 · Score: 1
  
  Heisenbugs have another common cause: race conditions in multithreaded apps. Yea, sure, you *always* think of all the possibilites before-hand, create a state diagram and rule out all race conditions... yea, right.
  
  I, for one, have come up with ony one way of dealing with those: use fewer threads. 3 is about the maximum number for pretty much any app i've ever written, and that includes the thread dispatching UI events. (and no, 1 is *not* enough).
  
  --
  I have discovered a truly remarkable sig which this 120 chars is too small to contain.
25. Re:Heisenbugs... by TimeZone · 2004-02-24 09:56 · Score: 1
  
  I'd never heard the word "Heisenbug", but instantly understood its meaning. We have them in hardware design too. For instance, sometimes when trying to debug bus signalling errors, attaching a probe to the bus is enough to throw the capacitance / inductance of the bus lines back into spec and make the error go away. Really a pain to deal with.
  TZ
26. Re:Heisenbugs... by composer777 · 2004-02-24 09:59 · Score: 2, Informative
  
  Buffer overflows tend to be less obvious than passing a pointer to a block of data that is allocated locally outside the scope within which it is created. In fact, I've never seen a bug caused by passing back a pointer to locally allocated data outside of the scope of the block (or function) in which it was created. In other words, stack based Heisenbergs seem easy to avoid. I think that this kind of bug indicates that a programmer is completely clueless about how machine code is generated. However, buffer overflows can be much less obvious, since the size of the buffer can in fact be created at run time, and can be variable. e.g.
  double * d = NULL; /*...later...*/
  d = malloc(i*sizeof(double));
  memset(d, 0x00, i*sizeof(double));
  where i could be anything.
  In this case if i is ten, and j is 11, then the code below could trigger an exception in some cases, but not every case:
  double m = d[j];
  
  The code above is not necessarily incorrect, it all depends on the values of i and j. However, passing back a pointer to any locally declared variable outside the scope of the block within which is created is always wrong. In fact, you don't need to return it, any method of referencing blocks of memory that are allocated locally outside of their scope is incorrect. The adress of a local variables always points to an address in the local data for that particular block of code, which is (usually) kept as an offset to a frame pointer. This block of memory (known as a stack frame) is deallocated after that particular block of code is left. (Note that passing back a pointer to a block of memory that is malloced inside a function is not incorrect. This uses the heap, not the function frame, to keep track of data).
  
  So for example:
  
  int a = 0;
  int *b = NULL;
  int **c = NULL;
  int *d = NULL;
  {
  int x;
  int *y;
  y = malloc(sizeof(double));
  a = x; //this is fine, we're just passing data
  b = y; //this ok too, y's block of memory is /*allocated out of the heap*/
  c = /*WRONG!! y is allocated
  locally, remember, we're talking
  about the address of y, not the
  adress y is pointing to in this
  case, the adress of y, like the adress
  x, is referrencing locally allocated
  data*/
  d = /* this is also wrong, since the adress
  of the data containing x, is pointing
  to memory that is pushed on the
  function stack*/
  }
  
  If you've been paying attention, you'll notice that any time you see a '&' before a right side variable that this should get your attention. I would wince right away if I saw that. My first instinct is to figure out where that variable is created.
  
  To read more about the basics of assmebly programming, and machine, which admittedly I'm not an expert on (I do know something about C/C++), you can go here:
  http://www.microsoft.com/msj/0298/hood0298. aspx
27. Re: Heisenbugs... by aled · 2004-02-24 10:41 · Score: 3, Funny
  
  Computers are deterministic
  
  Oh god, a computer calvinist!
  
  Let me tell you about the group of girls I hear telling the profesor how they were debugging the timing loop to measure an analogic signal, stepping through the loop code in the debugger. Free will kills determinism!
  
  --
  
  "I think this line is mostly filler"
28. Re: Heisenbugs... by Anonymous Coward · 2004-02-24 11:05 · Score: 0
  
  Yeah, but "timing issues" aren't always very controllable, especially when you have to worry about the underlying OS...
  
  Moreover, intermittant hardware failures are sure to give programmers grey hair... :(
  
  *shudder*
29. Re:Heisenbugs... by MurphyZero · 2004-02-24 11:21 · Score: 1
  
  In fact, we use Fortran code in our daily operations. Actually, I think some of the code may be pre-Fortran 77. It's definitely been around since the 70s, if not before.
  
  --
  Our founding fathers removed the guys in charge. Be American. Vote incumbents out.
30. Re:Heisenbugs... by kzinti · 2004-02-24 11:34 · Score: 2, Insightful
  
  In my experience, Heisenbugs are often the result of race conditions between concurrent threads.
  
  Yeah, but thread problems are so slippery, I don't even think of them as Heisenbugs. I think of them as Neutrinobugs.
  
  A stack-related Heisenbug (or really any kind of Heisenbug, for that matter) will always occur in the same place, given the same conditions. Always the same location, always the same stack trace. But when you stick in a print statement, the bug moves, or - worse - it goes away altogether. That'll make you pull your hair out the first couple of times it happens to you, but after a while you learn to spot them pretty quickly.
  
  Race conditions between threads, however, are maddening in their irregularity. They rarely happen in the same place at the same time. (If they do, you're lucky.) They can be random in when they choose to pop up. One time you might run five minutes before you see a crash. Next time, you might run hours before the program falls over and dies. And when you do get a crash, it's never in the same place, and often it's not even "near" the bad code. Two threads write to the same data structure at the same time because it wasn't locked correctly - and the program can continue running WAAAAY past the bad access. I've seen grown men brought to their knees, sobbing like little children, over threading problems. Race conditions keep suicide hotlines in business.
  
  And people wonder why I'm compulsive about putting locks around my data.
31. Re: Heisenbugs... by gidds · 2004-02-24 12:06 · Score: 3, Insightful
  
  Well, yes, but that determinism can be arbitrarily complex; causes may be very far removed from their effects. A GUI app can have a *lot* of past input to affect things, for example, especially if it runs for days or weeks. Exactly when asynchronous events happen can be extremely difficult to predict, detect, handle, or test; livelocks and race conditions are notoriously hard to track down. Even exact patterns of memory layout and allocation, file organisation or access, &c can affect subtle bugs. So while strictly true, determinism isn't a lot of help in some cases.
  
  --
  Ceterum censeo subscriptionem esse delendam.
32. Re:Heisenbugs... by abdulla · 2004-02-24 12:44 · Score: 1
  
  That's why I absolutely love valgrind, it does all the memory and thread debugging needed to figure out these bugs that'd otherwise confuse the hell out of you.
33. Re:Heisenbugs... by TelevisioSledgicus · 2004-02-24 12:46 · Score: 1
  
  The truth is Heisenbugs are caused by programmers.
34. Re:Heisenbugs... by pclminion · 2004-02-24 13:01 · Score: 1
  
  Yep, I love valgrind too... It's not perfect, however. It won't catch this bug:
  int main()
  {
  int foo[10];
  
  foo[10] = 1; /* Write beyond end of array */
  return 0;
  }
  This is a very hard bug to catch automatically, because in some circumstances a write to a stack location above the base pointer is a valid operation (e.g., storing into an array local to some function further up the call stack).
35. Re:Heisenbugs... by abdulla · 2004-02-24 14:50 · Score: 1
  
  It would catch that if foo was on the heap. That's one thing I never knew, I thought it did catch things on the stack, but a quick check proved me wrong.
36. Re:Heisenbugs... by ginsu · 2004-02-24 16:20 · Score: 1
  
  My favorite web hosting heisenbug was a user who claimed that while his desktop was turned on, he was getting email just fine -- but when he turned off his computer he started bouncing mail.
  
  Of course we thought he was crazy -- the mail was being delivered to the server and should have been totally decoupled from if his computer was booted or not... but upon looking into it we found he was totally right! He was using POP and deleting when he downloaded it. However, he had some large files which made his user account be very close to quota. So as soon as he stopped constantly downloading his email (when his computer was on) he'd get 1 or 2 more emails and then would be over quota, and viola, start bouncing messages.
37. Re: Heisenbugs... by BinxBolling · 2004-02-24 16:31 · Score: 1
  
  Computers may be deterministic, but the number of inputs may be so large that this is not always the most practical way to deal with them.
  There are many more nondeterministic influences on a program than your list acknowledges. For example, if you access an uninitialized variable, the data you get back won't necessarily be consistent. And such a mistake can cause failures that are quite remote from the ultimate cause.
  Good languages and compilers can help eliminate a lot of these sources of nondeterministic influence -- any decent compiler will warn you about accessing an uninitialized variable, and getting rid of pointers or at least putting them on a short leash can eliminate a whole class of programmer mistakes that can lead to apparently random failures.
38. Re:Heisenbugs... by blorf · 2004-02-24 18:04 · Score: 1
  
  The OS sometimes helps you fall flat on your face as well. A textbook cause of hard to find bugs:
  
  http://developers.sun.com/solaris/articles/multi th readed.html
  
  We were bit by this one in a large multithreaded server project on Solaris. Customers with 4+ way SMP boxes were reporting crashes under heavy load. Luckily the behavior indicated heap corruption in a heavily concurrent area of the codebase -- that lead us to uncover the one rogue OS call (buried in a support library) that was causing the problem.
  
  The developer responsible for the naughty non-reentrant call was summarily executed. (er, ok, poked with a sharp cluestick.)
39. Re: Heisenbugs... by maysonl · 2004-02-24 19:31 · Score: 1
  
  Computers are deterministic
  You obviously weren't there the time an IC fell out of the backplane while I was debuggin a program a few decades ago. That one really had me and the CE scratching our heads.
40. Re: Heisenbugs... by EugeneK · 2004-02-24 20:05 · Score: 1
  
  "Well, yes, but that determinism can be arbitrarily complex; causes may be very far removed from their effects."
  
  I think Roger Penrose said something similar in "The Emperor's New Mind" about the universe - that is, it may be deterministic but that doesn't mean we can tell the future since the determinism is far too complex.
41. Re:Heisenbugs... by Anonymous Coward · 2004-02-25 01:24 · Score: 0
  
  I have a friend who had a car that would get hideously messy inside (not that I have any room to talk) and so every so often we'd go through the thing and clean it out. It would take the better part of a night (maybe 3-6 hours). And fairly reliably, afterwards, we couldn't get the damn thing to start and we'd have to push start it.
  
  It took us entirely too long to figure out that it wasn't that the car didn't like running when it wasn't filled with crap. It was running the dome light for 3-6 hours, combined with an old battery. Oops.
The Slashdot 9 by grub · 2004-02-24 07:48 · Score: 0, Funny

(this IS slashdot after all)

1) Check your registry.
LINUX ain't got no registry crap!

2) Check your FAT32/CXFS filesystems.
LINUX is JOURNALLED and can do that in the background!

3) Verify your drivers are current.
LINUX is stable with drivers written in COBOL back in the 50's!

4) Defrag your disks.
Defrag?! You must be a WINDOZE LOOSER!!

5) Check your connections on the back of the PC.
HAHAHA! LINUX does that AUTOMATICALLY!!! LOOSERSSSSS!!!

6) Are your cards well seated? Power down and reseat.
HAHAHAAHA! LINUX can HOTSWAP EVARYTHING EVEN CPUs, LOOSERS!

7) Is your OS up to date? Perform a Windows Update.
HAHAAHAHA!!! LINUX can update itself automatically cuz of its LEET HEURISTICS and COOLNESS that MS aint got, LOOSERS!!!

8) Start in "SAFE MODE"
HAHAHA! What's the other? UNSAFE MODE!?!?! LINUX is always safe, LOOSERS!

9) Reinstall Windows.
HAHAHAH! LINUX NEVER NEEDS INSTALLING! Pour the blood from a freshly sacrificed penguin on the disk and it installs AUTOMATICALLY THROUGH AIR!!!!! LOOOOOOOSERS!!!!!

--
Trolling is a art,
I'd agree by scatterbrained · 2004-02-24 07:49 · Score: 5, Informative

I've read it and it's a good book, but I would
just borrow it from the library and then print
out the poster to remember the 'rules'.

There's not enough meat to keep it on my
precious shelf space.

--
-- All that's left of me, is slight insanity, whats on the right, I don't know. -- Bob Mould
1. Re:I'd agree by caseydk · 2004-02-24 08:44 · Score: 1
  
  Good call. Someone did an "ask slashdot" about pc diagnostics and I mentioned this one.
  
  The rules are really pretty straightforward and simple... that's why I think they need to be repeated every so often.
I don't need a book... by garethwi · 2004-02-24 07:50 · Score: 4, Funny

...to learn how to debug. I only need my own sloppy code.

--

Find funky gifts
1. Re:I don't need a book... by Dukael_Mikakis · 2004-02-24 08:03 · Score: 2, Funny
  
  Yeah who needs a book?
  
  System.out.println("1");
  ComplexClassInstantiator _cci = new ComplexClassInstantiator((UtilType)ClassGrabber.ge tObjectFromDefaults(_a, _b, _kl1, _z56), new UtilSocket(_p23876, _p5541), new Runnable() { public void run() { runDataSetAnalysis(_p1, _p2, _paramClass); } });
  System.out.println("2");
  
  Output: 1
  [Error message]
  So obviously the error is in the line between the two print statements.
  
  So, I repeat, who needs a book?
2. Re:I don't need a book... by Anonymous Coward · 2004-02-24 08:09 · Score: 0
  
  "Daddy made him for good, but he's turned out Evil." - Wallace and Grommit? First time I've ever seen a quote from those used on Slashdot... nice one! :D
3. Re:I don't need a book... by Anonymous Coward · 2004-02-24 08:12 · Score: 1, Funny
  
  If that's how you code all the time, you do.
4. Re:I don't need a book... by CuriHP · 2004-02-24 08:24 · Score: 1
  
  Not everything is sequential.
  
  --
  If it's not on fire, it's a software problem.
He forgot regression tests by mark99 · 2004-02-24 07:50 · Score: 5, Insightful

Regression test suites (if possible) should be maintained so that when bugs get fixed, they stay fixed.

Just my 2 cents.
Good read by GoMMiX · 2004-02-24 07:50 · Score: 5, Insightful

" If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process."

I can think of a WHOLE lot of tech's and admin's who really need to follow number 9 a lot closer.

Especially those Windows admins/techs who think 'restart' is the ultimate fix-all. Though, sadly, I suppose in many cases that's about all you can do with proprietary software. Well, that and beg vendors to fix the problem. (We all know how productive that is....)
1. Re:Good read by Dukael_Mikakis · 2004-02-24 08:08 · Score: 1
  
  Yeah, I can't tell you how many times, at all levels of my company, we are told simply to try it again, and to change the parameters and how the error must be "configurational".
  
  Try telling the client that what they want to do with our software and how they want to use it is a "configurational" problem and they're using our software incorrectly, and 9 times out of 10 the clients (in our case, major banks) will drop our software.
  
  But then again, Microsoft uses the "configuration" argument all the time with its customers, so I guess it works sometimes.
2. Re:Good read by swb · 2004-02-24 08:11 · Score: 5, Insightful
  
  No, it's number *5* that EVERYONE needs to remember to follow. I see way too many people (including myself in a hurry) changing more than one thing at a time and then immediately wondering what fixed or why it didn't get fixed.
  
  This is especially important when changing a second variable can actually mask the fix of the change of the first variable or cause a second failure that appears to be the same as the initial failure.
  
  I guess they should have added a rule 10: be patient and systematic. Obvious problems usually have non-obvious solutions, and a thorough examination of the situation is time consuming. Don't take short cuts or you might miss the problem.
3. Re:Good read by Ytsejam-03 · 2004-02-24 08:20 · Score: 1
  
  What I find most frustrating is working on a team with someone who does not follow number nine. If they can't reproduce a problem, then they assume it has gone away. Then a customer reports it, it becomes a major catastrophe, and we have to debug it using information from the customer because we can't reproduce it in the lab.
4. Re:Good read by globalar · 2004-02-24 08:35 · Score: 1
  
  Reboot seems to be the tech monkey equvalent of using convenient gotos or calling main() in highlevel code. In some cases, it's necessary, but in most cases, there should be a better way. Sometimes it's just a tradeoff for convenience, to which we can all relate.
5. Re:Good read by monique · 2004-02-24 10:06 · Score: 2, Informative
  
  This is a great example of where version control systems can really save your butt. Even if you *have* changed multiple things, at least you have some idea of what changed between when you started hacking around to find the bug and when you found it.
  
  --
  -monique
6. Re:Good read by jafac · 2004-02-24 10:06 · Score: 1
  
  You can't be patient when you're doing the debugging on a production system.
  
  Why are you debugging on a production system?
  
  Because some bean-counter decided that you didn't need an extra test system, or the time to thoroughly debug it BEFORE you put it into production.
  
  So why do we let bean-counters make ENGINEERING decisions?
  
  THIS is why software sucks!
  
  Kill all bean-counters.
  
  Today.
  
  --
  
  These are my friends, See how they glisten. See this one shine, how he smiles in the light.
7. Re:Good read by ocie · 2004-02-24 14:54 · Score: 1
  
  A great old chestnut from my math prof:
  
  "It's the lazy man who does the most work."
  
  --
  JET Program: see Japan, meet intere
8. Re:Good read by HeyLaughingBoy · 2004-02-25 04:31 · Score: 1
  
  Why are you debugging on a production system?
  
  Because some bean-counter decided that you didn't need an extra test system, or the time to thoroughly debug it BEFORE you put it into production.
  
  I agree with you up to a point: I've argued myself blue in the face that we developers should have more systems for testing on (which is expensive, as our machines cost us close to $100k each to build), but sometimes you have to use the production system (but I still think you have to be patient).
  
  Case in point: we received a customer complaint that should have been simple to reproduce. However, none of our in-house test systems showed that problem. No matter how creative we were, the machines simply shut down gracefully as they were designed to in that situation. The issue is compounded by the fact that this is not just an off the shelf PC, but an embedded system and we have multiple, slightly different versions of hardware in the test lab. So we go to the manufacturing dept and get a systems on the production floor identical in mechanical hardware, electronics and software to the customer's system. Same result: system works perfectly. Call the customer, have them create the situation on their machine that causes a problem while we are connected to it remotely: instant crash!
  
  We did identify a problem and fixed it, but to this day I've never heard of that problem on any other system. It turned out to be a race condition that required a thread being interrupted within a window measured in single-digit microseconds at most, leading to data corruption. Perhaps there was some tiny timing variation on that system's CPU that caused it to fail only on that system, we'll never know.
  
  Sometimes you don't have a choice but to debug on the only system that exhibits the problem.
but how do you know it's fixed? by sohp · 2004-02-24 07:51 · Score: 4, Insightful

Nothing about writing code for a test case that exercises the bug, then rerunning it every time you make a change you think will fix the bug? Seems like a big oversight. Any program of reasonable size is going to require wasting a significant amount of time restarting and re-running to the point of failure, and with every manual check of the result, there's an increasing probability that fallible human will make a mistake.

More programmers need to get Test Infected.
1. Re:but how do you know it's fixed? by scatterbrained · 2004-02-24 07:55 · Score: 1
  
  this is covered under the heading of 'make it
  fail', IIRC under one of the subheadings.
  
  --
  -- All that's left of me, is slight insanity, whats on the right, I don't know. -- Bob Mould
Kids these days.... by Anonymous Coward · 2004-02-24 07:52 · Score: 1, Flamebait

the original bug was a hardware problem.
1. Re:Kids these days.... by incubusnb · 2004-02-24 08:01 · Score: 0, Troll
  
  why is this modded funny? it should be informative because its true
  
  --
  /. is overrun by bed-wetting elitist nerds
  let it be known, for anything other than servers, a *nix OS sucks
2. Re:Kids these days.... by Anonymous Coward · 2004-02-24 08:37 · Score: 0, Offtopic
  
  well, not the original bug, but maybe the first actual one caused by a bug.
  
  http://www.history.navy.mil/photos/pers-us/usper s- h/g-hoppr.htm
My Favorite Debugging Tale by stuffduff · 2004-02-24 07:52 · Score: 2, Interesting

Soul of a New Machine by Tracy Kidder (book teaser) My favorite chapter was The Case Of The Missing NAND Gate.

--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"
for every cell phone provider out there by NumLk · 2004-02-24 07:52 · Score: 1

know that it never just goes away by itself
I can't tell you the number of times I've heard something along the lines of "Resetting your account will fix the problem." Guess what- it doesn't. Then again, after this, I guess I shouldn't expect much.

--
Children in the backseats don't cause accidents. Accidents in the back seats cause children.
Re:Hardware *Debugging*? by scatterbrained · 2004-02-24 07:52 · Score: 3, Informative

there's a distinction (in real life) and
in the book between troubleshooting something
that's supposed to work (think TV repair) and
debugging something that's never been made
before (hardware design).

Troubleshooting lends itself more to scripted
debugging, and "real debugging" is a bit more
free-form

--
-- All that's left of me, is slight insanity, whats on the right, I don't know. -- Bob Mould
Re:Hardware *Debugging*? by Mick+Ohrberg · 2004-02-24 07:52 · Score: 5, Insightful
My boss has three standard trouble-shooting questions:
1. Is it plugged in?
2. Are you logged in?
3. Is it spelled right?
Works in 9 cases out of 10.
--
Quidquid latine dictum sit, altum sonatur.
The first law of debugging by ToSeek · 2004-02-24 07:52 · Score: 5, Funny

"The most likely source of the current bug is the fix you made to the last one."
1. Re:The first law of debugging by Marvin_OScribbley · 2004-02-24 08:56 · Score: 2, Insightful
  
  "The most likely source of the current bug is the fix you made to the last one."
  
  Actually that's a corollary to the first law, which is:
  "Every bug fix will cause two more."
  
  --
  I'm not a journalist, but I play one on slashdot
Hey...the chicken bones are a valid fix too.... by Dr_Marvin_Monroe · 2004-02-24 07:53 · Score: 4, Funny

These "rules" are great, but nothing beats the mystic power of a little goat blood and chicken bones waved over a misbehaving system.

Without these, the average user might be tempted to try and fix it themselves.... Next thing, my job is being "offshored" to a phone bank in India.

No, the chicken bones and a little incantation will keep my job right here, where it belongs.
1. Re:Hey...the chicken bones are a valid fix too.... by pnatural · 2004-02-24 08:01 · Score: 2
  
  RMS, is that you?
  
  note to moderators
  
  i love what RMS has done for Free Software. the comment above is a joke, take it as such.
2. Re:Hey...the chicken bones are a valid fix too.... by Saeed+al-Sahaf · 2004-02-24 08:08 · Score: 2, Funny
  
  i love what RMS has done for Free Software. the comment above is a joke, take it as such.
  I might consider your request if your comment was... funny?
  
  --
  "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
3. Re:Hey...the chicken bones are a valid fix too.... by Anonymous Coward · 2004-02-24 08:48 · Score: 0
  
  I might consider your request if your comment was... funny?
  
  Consider it for what? That request is made to moderators, not to you or I. By posting a comment about the request you automatically remove yourself from the group that is supposed to be reading it.
4. Re:Hey...the chicken bones are a valid fix too.... by Gildor · 2004-02-24 09:21 · Score: 1
  
  I prefer a rubber chicken. Less messy, and kinder to animals. :)
And the final solution by aliens · 2004-02-24 07:53 · Score: 4, Funny

10) Hammer.

if 10 fails

11) Shotgun.

Congrats problem solved, human destressed.

--
-- taking over the world, we are.
1. Re:And the final solution by Anonymous Coward · 2004-02-24 08:13 · Score: 0
  
  I assume you mean, shotgun to the face?
2. Re:And the final solution by Rahga · 2004-02-24 08:19 · Score: 2, Funny
  
  Most solutions only go up to ten.... These go to eleven.
Negative. by Anonymous Coward · 2004-02-24 07:53 · Score: 1, Insightful

Chips have bugs, why do you think there are re-spins? We are talking from a design point here, not a "techie-fix-this-shit" point. Different ballgame.
Time by quarkoid · 2004-02-24 07:53 · Score: 5, Insightful

One thing's clear from looking at that list - spend more time on testing your code.

Unfortunately, speaking as an ex-programmer, time is one luxury that PHBs don't afford their minions. A project needs to be completed and knocked out of the door as soon as possible. The less time spent on unnecessary work, the better.

It is also unfortunate that PC users have been brought up expecting to have buggy software in front of them and expecting to have to reboot/reinstall. What motivation is there to produce bug free code when the users will accept buggy code?

Ho well, at least I run my own company now - master of my own wallet - and can concentrate on quality solutions.
1. Re:Time by Dukael_Mikakis · 2004-02-24 08:17 · Score: 2, Interesting
  
  Yeah, the sad truth seems to be that when prioritizing general and regression testing seems to rank low on the list because it doesn't actually create new product (though it is of course necessary, we aren't selling our testing, we're selling our new code).
  
  With marketers and product managers and sales people all pushing our product and making wild promises about delivery dates and patch dates it becomes a fruitless effort to keep on top of the regression testing, and I've found that with the software at my company, it's sort of ramped up until it'll reach a breaking point where we'll just need to scrap big portions of our system and release a whole new build, likely using "Buzzwords" or cryptic acronyms that are supposed to indicate progress.
  
  ... and it doesn't help that a big chunk of our source code was recently leaked.
2. Re:Time by ocie · 2004-02-24 14:59 · Score: 1
  
  Bugs are a natural consequence of programming. Eliminate them as you can of course. Designin software assuming a piece of code is bug free is like designing a bridge assuming beams don't bend.
  
  That's my philosophy anyway.
  
  --
  JET Program: see Japan, meet intere
Sounds interesting by pcraven · 2004-02-24 07:53 · Score: 4, Interesting

Teaching people how to debug isn't that easy. It requires some experience before they get the hang of it.

I'm a stickler for labeling code often, and tracking changes released to production. Because of this, I often seem to be a stick in the mud when it comes to refactoring.

Heavy refactoring makes your code nicer. But when you have to do a lot of debugging on something that worked be refactoring, you can start to appreciate that keeping the change set managable is a 'good thing'. (I do financial apps, so this may not work for everyone.)

The things I see people fail at most is the ability to 'bracket' the problem. Go between code that works and doesn't work, filtering the problem down to something simple.

The second thing is the inability of some people to go 'deep' in their debugging. Decompile the java/C#/whatever code, trace through the library calls, whatever.

Its nice to see another good book on the market that seems to cover these topics.
1. Re:Sounds interesting by Jotaigna · 2004-02-24 08:01 · Score: 0
  
  There is another book, but this time a novell by Ellen Ullman its called "The Bug" and it deals with a software developer fighting this kind of critter.
  There is some insight on the characters mind and obssesing with the bug. I saw this in Spectrum Magazine, i dont know if it has been reviewed here.
  
  --
  "The quality of life is inversely proportional to the number of keys on your keyring."
Rule 0 by Anonymous Coward · 2004-02-24 07:54 · Score: 5, Funny

0. If you're a software guy blame it on hardware, if you're a hardware guy blame it on software.

0.1. Blame it on the user.

0.2. Blame it on your colleague.

0.3. Blame it on your manager.

0.4. Yell at the computer and tell it to work dammit!

0.5. Put head on keyboard and sob.

0.6. Read Slashdot.

0.7. Post on Slashdot.

0.8. Call it a feature not a bug.
1. Re:Rule 0 by Patrik_AKA_RedX · 2004-02-24 08:18 · Score: 1
  
  0.9. Get an hexorcist
  
  0.A. light candles in the form of a polygon around monitor. Pray to the Holy Electron (Or the Holy Proton if don't believe in Leptons)
  
  0.B. Burn several Windows CD's (with fire that is)
  
  0.C. Pull the plug, claim a power blackout and call it a day.
2. Re:Rule 0 by cant_get_a_good_nick · 2004-02-24 12:04 · Score: 1
  
  0. If you're a software guy blame it on hardware, if you're a hardware guy blame it on software.
  Old joke...
  The hardware guy says "it's a software problem."
  The software guy says "it's a hardware problme."
  The liberal arts major says "umm, do you want fries with that?"
  
  Sadly, a lot more hardware and software guys ware wearing paper hats and supersizing it.
  
  <FLAMEBAIT>But even though they're making 1/3 of what they're used to, they're happy because thanks to Bush NewSpeak, they're in manufacturing jobs.</FLAMEBAIT>
Remain focused. Don't let others' WAGs get to you by PornMaster · 2004-02-24 07:54 · Score: 1

I find that when troubleshooting systems with which other people have worked longer, I have had better luck just asking them simple facts and troubleshooting myself rather than listening to their wild-ass guesses and having to shoot them down.

--
500GB of disk, 5TB of transfer, $5.95/mo
You can read a sample chapter in PDF format by TheCrayfish · 2004-02-24 07:54 · Score: 5, Informative

You can read a sample chapter from the Debugging Rules book in PDF format by going here. (Requires the free Adobe reader.)

--

The Big News Page
1. Re:You can read a sample chapter in PDF format by ajbajb · 2004-02-24 10:48 · Score: 1
  
  If this chapter is representative of the entire book, I would not rush out to buy it. The stories are somewhat entertaining, but the advice is repetitive and not too insightful. How many different ways do you need to say 'read the manual' in one chapter?
Rule #10 by UncleBiggims · 2004-02-24 07:54 · Score: 0

Market the bug as a "random feature".

Are you Corn Fed?
1. Re:Rule #10 by Dukael_Mikakis · 2004-02-24 08:19 · Score: 2, Funny
  
  ... either that or add a comment right before the pertinent code:
  
  /* Code used with permission: Microsoft Corporation */
  
  (Not that your clients would have your source code to look at, but ...)
Re:Hardware *Debugging*? by pclminion · 2004-02-24 07:54 · Score: 5, Insightful

I think the term you want is TROUBLESHOOTING.
Troubleshooting is what you do to fix your mom's ethernet card. "Oooh, it's on the bottom PCI slot, has no interrupt line. I'll just move it up one slot..."
Debugging is what you do with an oscilloscope to figure out why a particular circuit design isn't working as anticipated. You don't "troubleshoot" a circuit design. You debug it.
Or, to put it another way, "troubleshooting" is what a tech support monkey does. "Debugging" is what an engineer does.
Re:Hardware *Debugging*? by wondafucka · 2004-02-24 07:55 · Score: 3, Insightful

Get off it. I can't think of a single reason why someone can't "debug" hardware or anything else for that matter. The origin of the word comes from a troubleshooting situation anyways. Why should someone be able to debug a relational database but not a relationship?

--
postmodernsideshow.com
Top 10 Rules of Debugging by ackthpt · 2004-02-24 07:55 · Score: 5, Funny

10. Code is _always_ Beta. It's never done until it's no longer in use or support no longer exists.
9. The better the SDK, the more sophisticated the bugs.
8. There's always more bugs in the other guy's (girl's) code.
7. Declaring code bug-free is asking for it to fail at the worst possible time with the greatest visibility.
6. A good design is as likely to have bugs as a bad one. Bugs are equal opportunity.
5. Debugging time is inversely proportional to coding time.
4. If it works the first time, there's a bug, but you won't find it until you roll it out.
3. Debugging is fun. Really! It's when you run out of bugs that you should wonder if you got them all, that's not fun.
2. The most difficult bugs to find are in the most straightforward looking code.
1. That's not a bug, that's a feature.

--

A feeling of having made the same mistake before: Deja Foobar
1. Re:Top 10 Rules of Debugging by kooso · 2004-02-24 08:08 · Score: 3, Interesting
  
  10. Code is _always_ Beta. It's never done until it's no longer in use or support no longer exists.
  
  What about the opposite. Anyone against versioning? Tried and failed in Google to find an "Against versioning" campagin. I mean, somebody must be out there who only wants version 1.0 for all software.
  
  I guess the issue is in the meaning we attach to version numbers. What about a program as a well-specified function that, once is implemented (at least for a fixed platform) needs no "enhancements"?
  
  (E.g. Don Knuth adds a digit to each version of TeX, implying that he doesn't plan to add anything substantial, or else he'll be running into very long version numbers).
2. Re:Top 10 Rules of Debugging by ackthpt · 2004-02-24 08:10 · Score: 1
  
  I guess the issue is in the meaning we attach to version numbers. What about a program as a well-specified function that, once is implemented (at least for a fixed platform) needs no "enhancements"?
  Then it becomes Bug Ver. 1.1
  
  --
  
  A feeling of having made the same mistake before: Deja Foobar
3. Re:Top 10 Rules of Debugging by Zangief · 2004-02-24 09:16 · Score: 2, Interesting
  
  Rule 0. If you are programming in C, or similar, start counting from zero.
4. Re:Top 10 Rules of Debugging by asr_man · 2004-02-24 09:35 · Score: 1
  
  # 5. Debugging time is inversely proportional to coding time.
  
  It is inversely proportional to design time.
  
  It is exponentially proportional to coding time. :-)
5. Re:Top 10 Rules of Debugging by hburch · 2004-02-24 09:44 · Score: 2, Funny
  
  If it works the first time, there's a bug, but you won't find it until you roll it out.
  There truly is nothing more scary than a program that works the first time. You know there's a bug, but you cannot find it. It's sitting there, silently laughing at your vane attempts to distrurb it from it's hiding place with your testing.
  The bug plots against you, with an evil grin on it's face, biding its time until you finally decide that maybe, somehow, your code actually was correct the first time. Then, just as you stop holding your breath everytime you here about that code piece being executed, BAM! Three critical, blocking bugs are filed against that code.
  As you open the code, the bug sits there like a blinking red light of impossibility. You clearly did something wrong. It cannot possible work, much less ever have worked. Somehow, all those tests that your program passed before fail miserably. You must rewrite the entire program from scratch because of your flawed logic in the design.
  Fear, programmers of the world! Fear the lurking bug that has learned patience!
6. Re:Top 10 Rules of Debugging by ackthpt · 2004-02-24 10:03 · Score: 1
  
  It is exponentially proportional to coding time. :-)
  No, it's exponentially proportional to the number of programmers working on it. B=n^(number of coders) where n=number of modules or number of weeks less than reasonable number of weeks, etc.
  This, of course, explains the high bug count in Windows, as Microsoft has legions of coders at work.
  And this isn't theory, it's fact.
  
  --
  
  A feeling of having made the same mistake before: Deja Foobar
7. Re:Top 10 Rules of Debugging by Anonymous Coward · 2004-02-24 10:22 · Score: 0
  
  > Don Knuth adds a digit to each version of TeX, implying that
  > he doesn't plan to add anything substantial, or else he'll be
  > running into very long version numbers
  
  IIRC he does this because he likes his version numbers to converge, indicating that his programs aren't perfect, but at least getting closer to the ideal all the time. Also, he doesn't just add a digit, he adds a digit of the decimal expansion of pi (or is that e, the base of the natural logarithm?)
8. Re:Top 10 Rules of Debugging by chgros · 2004-02-24 20:30 · Score: 1
  
  he adds a digit of the decimal expansion of pi (or is that e, the base of the natural logarithm?)
  > tex
  This is TeX, Version 3.14159 (Web2C 7.4.5)
Finally, grub posts something truly funny by Anonymous Coward · 2004-02-24 07:55 · Score: 0

because it's truly true.
Re:Hardware *Debugging*? by SamiousHaze · 2004-02-24 07:56 · Score: 1, Informative

Actually,
the first computer "bug" was a hardware bug, as it was a moth that flew into a relay and jammed it. Removing the bug physically was debugging. http://www.maxmon.com/1945ad.htm is a reference.

Besides, when you are building a machine and dealing with Logic Gates - its the same type of debugging as with software logic.
How Lame. by BigChigger · 2004-02-24 07:56 · Score: 1

Maybe if you spend a few weekends dealing with MS "bugs" you would have a new appreciation.

BC
1. Re:How Lame. by Anonymous Coward · 2004-02-24 08:01 · Score: 0
  
  Maybe if you spend a few weekends dealing with MS "bugs" you would have a new appreciation.
  
  And realize that you are better off starting at step 9.
Re:Hardware *Debugging*? by Anonymous Coward · 2004-02-24 07:57 · Score: 0

Considering that, as of now, no less than six people have pointed out the wrongness of this statement, can somebody please mod him back down to reality? Thanks.
MOD PARENT UP!!! by Anonymous Coward · 2004-02-24 07:57 · Score: 0

OP and the mods that modded OP up are morons and obviously not anywhere near the hardware design industry.
Number one by Jooly+Rodney · 2004-02-24 07:58 · Score: 2, Interesting

Okay, haven't read the book, and I guess dhweeler is distilling the rules down to a soundbyte, but isn't #1 the most important and difficult part of debugging? I mean, if I knew system Foo ver. Bar had such-and-such an idiosyncrasy, I could code around it, but Googling for hours to find the one message board post that lets you Understand The System can be aneurysm-inducing. It's not even always the idiosyncrasies of a system -- the sheer volume of stuff you have to learn about I/O conventions, operating systems, etc., in order to write a useful program in a non-toy language boggles the mind. I'm surprised people are able write programs in the first place.
1. Re:Number one by burris · 2004-02-24 09:11 · Score: 1
  
  Have fun "working" with your "non-toy" language. Personally, I'll stick to writing useful programs by "playing" with my "toys."
  
  burris
Sonuvabitch! by Anonymous Coward · 2004-02-24 07:59 · Score: 3, Interesting

Like 15 years ago in my intro CSE class my first Fortran program which found "edges" in a text file filled with numbers did this. Everything looked good. It would compile. But wouldn't print out its little thing. So I instert statements to print out status of where it is, and it works! I take out the statements and it doesn't. In/out in/out. SO I go ask the TA for help. He says its one of the damndest things he's seen, sorry, Fortran isn't something he's really an expert at.

I have hated fortran for years, having written a single program in it, based on this.
1. Re:Sonuvabitch! by Aardpig · 2004-02-24 08:07 · Score: 3, Informative
  
  I have hated fortran for years, having written a single program in it, based on this.
  
  Fortunately, things have changed a lot since then. With the introduction of modules and array arithmetic in Fortran 90/95, sitations where routines are called with the wrong arguments, or arrays are subscripted incorrectly, are much less frequent. I haven't been bitten by a Heisenbug for a couple of years now; and when I am, switching on checking at compile and run time usually reveals the problem pretty quickly.
  
  --
  Tubal-Cain smokes the white owl.
2. Re:Sonuvabitch! by Aardpig · 2004-02-24 17:56 · Score: 1
  
  Racism against Indians, or general disgust with dangerous religious fundamentalists?
  
  Against Indians, whether they be Hindu, Muslim, Bhuddist, Jain, Christian or Atheist. I'm not Indian myself, but am getting pretty revolted with the high levels of anti-Indian sentiment displayed on /., principally regarding the outsourcing of jobs to India.
  
  --
  Tubal-Cain smokes the white owl.
3. Re:Sonuvabitch! by WNight · 2004-02-25 03:55 · Score: 1
  
  I had initially thought you meant Arabs, and were just lumping the whole area together. Racism towards Indians, as in, from India? Weird. If anything I see Slashdot as being divided between hating the Arabs/Muslims (not Indians) and the USA/Israel.
  
  And all of the anti-outsourcing anger I've seen has been directed at the companies, not the recipients. Some people imply the work is of lower quality, but that's (imho) because support people lack cultural context and programmers aren't available for discussion, etc. Same reasons that outsourcing locally fails.
4. Re:Sonuvabitch! by Lil'wombat · 2004-02-25 05:47 · Score: 1
  
  Best FORTRAN bug I've every seen. On one machine it would run and converge on to a solution. The same compiled program on a different machine would crash with division by zero.
  
  I eventually found the bug. A team member cut and pasted twice into a routine.
  Do 10 i=1 ... .. ..
  10 Continue .. ..
  Do 10 i=1 .. .. ..
  10 Continue
  (it was a cheesey version of Fortan for the MacIntosh)
  
  Sometimes the code made the correct jump, sometimes it didn't.
  
  Nothing like a quality compliler for acedemic use.
  
  --
  
  Truth: If it's not one thing, it's another
Re:debug this by stuffduff · 2004-02-24 08:00 · Score: 1

Interesting and fairly well written.

--
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"
Race Conditions? by Speare · 2004-02-24 08:03 · Score: 4, Insightful

Make It Fail is pretty hard to do when it comes to race conditions. This has got to be the most frustrating kind of bug. Others are referring to the Heisenbug which comes in a variety of flavors.
Sometimes you don't KNOW when there's multiple threads or processes, or when there are other factors involved.
Have you noticed that a new thread is spawned on behalf of your process when you open a Win32 common file dialog? Have you noticed that MSVC++ likes to initialize your memory to values like 0xCDCDCDCD after operator new, but before the constructor is called? It also overwrites memory with 0xDDDDDDDD after the destructors are called. And that it ONLY does these things when using the DEBUG variant build process? Did you know that .obj and .lib can be incompatible if one expects DEBUG and the other expects non-DEBUG memory management?
Someone on perlmonks.org was just asking about a Heisenbug where just the timing of the debugger threw off his network queries. Add the debugger, it works. Take away the debugger, it fails. I've got a serial-port device which comes with proprietary drivers that seem to have the same sort of race condition.
The top 9 rules mentioned here look great. But you could write a whole book on just debugging common race conditions for the modern multi-threaded soup that passes for operating systems, these days.

--
[ .sig file not found ]
1. Re:Race Conditions? by Anonymous Coward · 2004-02-24 08:16 · Score: 0
  
  And this is why application programs should avoid threads at all costs.
2. Re:Race Conditions? by Alan+Livingston · 2004-02-24 09:12 · Score: 1
  
  You know... It's people like you that gave us C# and Java crap (I can never write code that properly uses pointers, so pointers must be A BAD THING! Let's create a language that doesn't use pointers!)
  
  Learn about threads and synchronization and stop writing code with race conditions!
3. Re:Race Conditions? by Ben+Hutchings · 2004-02-24 10:23 · Score: 2, Insightful
  
  The people who gave us Java are way too fond of threads, actually. (Want non-blocking sockets? Sorry, you'll have to add one thread per socket.) Most programmers still don't understand how and why to do synchronisation because they don't understand how weak modern memory models are (and have to be if processors are to continue accelerating). So while threads should not be banned they should also not be used without careful consideration of the consequences (more complex code and possibly reduced performance due to synchronisation) and the alternatives (multiple processes, asynchronous I/O, maybe no concurrency at all).
4. Re:Race Conditions? by Ben+Hutchings · 2004-02-24 10:25 · Score: 1
  
  Add to those consequences additional training and debugging costs.
Re:Hardware *Debugging*? by Anonymous Coward · 2004-02-24 08:03 · Score: 0

If it's Windows, add:

4. Try rebooting.

That fixes almost half of all Windows problems for me.
Effective Technique by Rick+the+Red · 2004-02-24 08:04 · Score: 5, Funny

I find the best way to uncover bugs is to do a demo for your boss's boss.

--
If all this should have a reason, we would be the last to know.
1. Re:Effective Technique by _Sharp'r_ · 2004-02-24 10:32 · Score: 1
  
  This is usually most effective if you can include someone who has a solution that competes with yours in the demonstration with your boss's boss.
  
  --
  The party of stupid and the party of evil get together and do something both stupid and evil, then call it bipartisan.
2. Re:Effective Technique by isomeme · 2004-02-24 10:34 · Score: 5, Funny
  
  I knew I'd really become a software manager when I gained the ability to cause code to fail by standing behind the person trying to demo it.
  
  --
  When all you have is a hammer, everything looks like a skull.
3. Re:Effective Technique by MurphyZero · 2004-02-24 10:58 · Score: 2, Funny
  
  When preparing a demo, remember never to say the following: "Have a look a this. I (1) finally got all the bugs out or (2) got it to work now." Either comment is a surefire way for it to fail immediately.
  
  --
  Our founding fathers removed the guys in charge. Be American. Vote incumbents out.
I really liked the book, but I would have... by mykepredko · 2004-02-24 08:05 · Score: 4, Insightful

probably added a step stating that the problem symptoms and causes should be articulated clearly (probably between #3 and #4) before trying to fix anything. I've seen too many engineers/programmers/technicians list symptoms and attack them individually, only to discover that they were related.

On the surface, this flies in the face of "divide and conquer" - but what I'm really saying here is make sure you have the problem bounded before you attack it.

Also, with Step 9, I would have liked to see more emphasis on ensuring that nothing else is affected by the "fix". Making changes to code to fix a problem is often a one step forward and two steps backwards when you don't completely understand the function of the code that was being changed.

All in all, an excellent book in a little understood area.

myke

--
Mimetics Inc. Twitter
1. Re:I really liked the book, but I would have... by the_archivist · 2004-02-24 08:38 · Score: 0
  
  The symtoms should tell you how to divide and conquer
  
  --
  while(karma less_than enough_karma){karma++}
2. Re:I really liked the book, but I would have... by Anonymous Coward · 2004-02-24 09:24 · Score: 0
  
  This was my nitpick with: "Quit thinking and look"
  
  The very next rule should be: "Quit looking, and think"
  
  Based on Rule #1, Understand the System, you can often figure out what is going on by thinking things out. One of the things I hate about "helping" people debug is watching them poke at things that have nothing to do with the problem, because they don't understand/haven't thought about what might be the cause.
  
  That said, sometimes something totally unrelated is actually causing the problem. It's a lower probability, though.
Anecdote by ackthpt · 2004-02-24 08:06 · Score: 2, Funny

A dishonest computer repairman dies and finds himself in Hades. The Devil smiles and says, "we've been waiting for you and have your place for eternity all ready." The repairman shudders, but follows the Devil as he is lead down a tunnel. They pass several doors along the way and the repairman peers through portals to see the other condemned up to their necks in feces, languishing in pools of acid and being proded by lesser demons with red hot pokers. The devil finally comes to a door and rubs his hands together. "Here you are, your eternal damnation." The repairman cringes as the door is flung open, but sees only a vast cavern filled with PC's, Mac's, Sun SparcStations, etc. "What? That's it?", he enquires, "I shall spend eternity fixing these then?" "Oh, yes", says the Devil. "Well that's not so bad," the repairman cracks his knuckles and strides into the cavern. "Just one thing", says the Devil as he closes the door, "they've all got intermittent problems."
"AAAHHHHHHHHHHHHH!!!!!!"

--

A feeling of having made the same mistake before: Deja Foobar
1. Re:Anecdote by Anonymous Coward · 2004-02-24 08:20 · Score: 0
  
  you suck at joke telling. Please die. kthxbye.
Missed one: explain it to someone by deanj · 2004-02-24 08:06 · Score: 5, Insightful

They missed a good one: explain the bug to someone.

If you start explaining the bug to someone, there's a good chance in mid-explanation you'll realize a solution to the problem.

Some school (can't remember which) had a Teddy Bear in their programming consulting office... There was a sign. "Explain it to the bear first, before you talk to a human". Silly as it sounds, people would do it, and a large portion of the time they'd never actually have to consult the staff... by explaining it to the bear, they solved the problem.

Weird, but true.
1. Re:Missed one: explain it to someone by MythMoth · 2004-02-24 08:22 · Score: 1
  
  Oops. I just posted almost exactly the same point in a rather more wordy way. We never had the bear, but we did joke about the "cardboard cutout" we'd use instead of an actual person. Presumably someone at the company had heard of the teddy bear in question and adapted it to our circumstances.
  
  Regardless, this is easily the single most important debugging tip and I heartily agree.
  
  Dave.
  
  --
  --- These are not words: wierd, genious, rediculous
2. Re:Missed one: explain it to someone by og_sh0x · 2004-02-24 08:33 · Score: 1
  
  Is that why jdbgmgr.exe has a teddy bear icon, by any chance? I always wondered...
3. Re:Missed one: explain it to someone by Speare · 2004-02-24 09:22 · Score: 5, Interesting
  
  No, that's a funny thing. I drew that bear icon over ten years ago when I was on the Win3.1 shell team. I didn't even know it still shipped in any MSFT product.
  The teddy bear is named Bear, and was the cuddly companion of one of the Windows 3.1 / Windows 95 shell team developers. He'd carry it *EVERYWHERE*. There are quite a few internal APIs called BunnyThis() or BearThat(), usually with generic numbers, because giving it a name would entice application writers to try to call it. (They're useless three-line internal helpers, but that didn't stop conspiratorial book-writers from trying to document them anyway.)
  Bear also appears in the Win3.1 credits, where I made portraits of spectacled Bill, bald Steve, and large-schnozzed Brad Silverberg.
  Now I don't have any Microsoft products at my house, anymore, except one outdated off-net machine which runs edutainment CD-ROMs for my daughter.
  
  --
  [ .sig file not found ]
4. Re:Missed one: explain it to someone by Anonymous Coward · 2004-02-24 09:58 · Score: 0
  
  This might be what he was talking about :-)
5. Re:Missed one: explain it to someone by JurgenThor · 2004-02-24 10:12 · Score: 0
  
  I had a 'Wilson'. He kept me sane whilst I was stranded on the island. Now I use him when I'm debugging.
  
  --
  GENERAL PUBLIC SIGNATURE (GPS) Any replies (derivatives) of this post must also use the GPS
6. Re:Missed one: explain it to someone by ChefBork · 2004-02-24 11:02 · Score: 1
  
  We called ours "Bugga Bear"...
The Three R's of Windows Debugging by iguana · 2004-02-24 08:06 · Score: 1, Funny

Retry
Reboot
Reinstall

And that's why I love having source code!
1. Re:The Three R's of Windows Debugging by Anonymous Coward · 2004-02-24 08:14 · Score: 0
  
  I always thought it was:
  Reboot (Windows)
  Reinstall (Your app)
  Reformat (Your hard drive)
2. Re:The Three R's of Windows Debugging by Anonymous Coward · 2004-02-24 08:19 · Score: 0
  
  ROR ROFL LMAO WinDOZE!!! Micro$oft!!! I'm so fucking 31337 cos I use Linux.
  
  MS VC++ has some of the best debugging tools around period. You can take your open source gdb and ddd and stick it up your Linux Zealot asshole.
  
  When tools like purify, quantify and clearcase are available for free on Linux, then we can talk. Until then, STFU you don't know what you are talking about.
3. Re:The Three R's of Windows Debugging by Petronius · 2004-02-24 08:23 · Score: 1
  
  Reformat
  rpm -U ...
  
  --
  there's no place like ~
Missing rule by timdaly · 2004-02-24 08:06 · Score: 3, Insightful

He missed a rule: Explain the bug to someone else.
The second pair of eyes often finds the problem
even if they don't have a clue what you are talking
about.
1. Re:Missing rule by boster · 2004-02-24 08:50 · Score: 1
  
  Absolutely. Let me add that often simply having to explain the problem to someone else will make you realize something you'd been overlooking (more often than not, a bogus assumption).
  
  --
  Madness takes its toll. Exact change please.
2. Re:Missing rule by lrucker · 2004-02-24 11:38 · Score: 1
  
  Be careful who you pick, though. I had a manager who thought that when I talked through a bug with her, it meant that I didn't have a clue what I was doing, and that I was asking her for help.
  Actually she was just the nearest available person when I needed to talk out the problem (Lois McMaster Bujold nailed it when she had one character say that doing that makes you slow down your thought processes til you can actually see the problem), and the last person on the team I'd go to for actual help.
3. Re:Missing rule by josephgrossberg · 2004-02-25 05:14 · Score: 1
  
  That's true.
  
  Most of the time I ask coworkers for help, they don't even have to respond.
  
  Their ears solve the problem as often as their brain.
  
  --
  
  Joe
  http://www.joegrossberg.com
Re:Hardware *Debugging*? by Anonymous Coward · 2004-02-24 08:06 · Score: 2, Interesting

Besides being highly apocryphal - that was the first use of the word bug in context of computing. It is not the first hardware bug by a long shot. Actually you would have known that if you actually read the page you linked to.
A missing rule by Tired+and+Emotional · 2004-02-24 08:10 · Score: 5, Insightful

One rule he's missed is very important: Before making a measurement (like printing the value of a variable or changing something about the code) work out what answer you expect to see. Note well - do this before you look at the result. When you see something different, either its a symptom of the bug, or a symptom of you not yet understanding the system. Resolving this will either improve your understanding or turn up the problem.

--
Squirrel!
1. Re:A missing rule by sohp · 2004-02-24 08:22 · Score: 1
  
  Yes, work out the expected answer, and then write code to test for it. Not only will it prevent you from making a dumb mistake in doing the manual comparison, you then commit the test code and you've created a regression test in case that bug ever pops up again.
2. Re:A missing rule by Rupert · 2004-02-24 16:26 · Score: 1
  
  This brings up a pet peeve of mine. "watch" windows in modern debuggers have the friendly feature of being able to display the return value of funtions. The following are true stories.
  
  Colleague #1 has a problem in a for() loop. To keep track of where he is in the loop, he puts i++ in the watch window. Then he gets really confused when i goes from 0 to 2 to 4 and so on.
  
  Colleague #2 is using C# to read something from a configuration file. The call to Read() is not returning any data, but the call to Read() in the watch window is showing the correct number of bytes.
  
  --
  
  --
  E_NOSIG
my review... by chmod_localhost · 2004-02-24 08:13 · Score: 2, Informative

Mr. Agans' book presents real life experiences, or as he calls them war stories and humor filled comment/anecdotes.

I find myself chuckling and giggling along while reading this book, some of what he said brought back my own memories while working/debugging on my own software bug(s), or other people's bug(s) that I have somehow 'inherited' because they left the company, or are too busy on other projects to debug their own code. I like the metaphors that he uses to explain ideas or concepts that seems a bit too complicated to understand.

Mr. Agans made this very clear in the beginning of his book; the book is not a cover-it-all book, it is a general concept book on how to isolate, find, and debug something that has gone wrong. The principles presented by Mr. Agans can be applied to situations covering everyday life. He presented examples of well pump and light bulb, etc...

More experienced software/hardware engineers or more experienced problem solvers who read this book might find it covering bases that they already know, but the humor makes it worth while.
One Rule For 90% of Bugs by BinBoy · 2004-02-24 08:13 · Score: 5, Informative

4. Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.

That's a very usueful rule. In nearly 20 years of programming I haven't found any tool or technique that works better than printf / std::cout / MessageBox and logging.

Logging is especially important if your users aren't conveniently in the same building as you. When a customer has a problem I've never seen before, I usually tell them to run the program with the -log switch and send me the log. Nearly always this leads to the problem and I can fix the bug within minutes.

Add logging to your app and you'll increase the number of hours you can sleep.
1. Re:One Rule For 90% of Bugs by Anonymous Coward · 2004-02-24 10:05 · Score: 1, Funny
  
  Damn, you're pretty lucky if you can tell your customers to run the program with the -log switch, and they have the slightest idea what you're talking about.
  
  I, on the other hand, have to tell my customers "OK, click on the start button... the button in the bottom left, that has start written on it... no, it's not missing, it's right there ('Ohhh, START!' they say). OK, now find out where the hell Microsoft decided to put the command prompt shortcut this week...." (because a helpful sysadmin disabled the run command, but every user can still run CMD...). This process can take up to an hour, and I'm still not even sure if they've typed -log or _log.
2. Re:One Rule For 90% of Bugs by JurgenThor · 2004-02-24 10:45 · Score: 0
  
  Step one: Try getting them to press and hold the windows key (on the keyboard) then R then type cmd or on older OSes command
  
  --
  GENERAL PUBLIC SIGNATURE (GPS) Any replies (derivatives) of this post must also use the GPS
3. Re:One Rule For 90% of Bugs by soft_guy · 2004-02-24 11:10 · Score: 1
  
  In some cases where printf isn't possible, for example code that is very time dependent, event handling code, code that is triggered when something on the screen changes, etc. I have found that using brief sounds can help me to understand what is going on. You can use different pitches to mean different things. This is obviously a case where you want to rip out the debug code when you are done, but sure helps you in cases where printf can't.
  
  --
  Avoid Missing Ball for High Score
4. Re:One Rule For 90% of Bugs by Anonymous Coward · 2004-02-24 11:29 · Score: 0
  
  Step zero: Read the part about the disabled run command. It's called half-assed security - set a policy disallowing the "run any program" function in the start menu, but leave the command prompt shortcut right where it is.
5. Re:One Rule For 90% of Bugs by g0_p · 2004-02-24 13:24 · Score: 1
  
  While coding in Java, things have been much better since I have been using log4j. I can catch a lot of the obvious bugs by reading through the log. When I am writing code, I generously put logging statements all over the place. I use the config file to switch on and off debugging information if I am being flooded with too many log messages. And all without recompiling.. Sure the performance takes a small hit, but before moving the code into production I can always run a script to comment out the logging code.
  
  The issue I have with using command line debuggers is that, very often I get too distracted using the debugging tool rather than tracking the code flow, and if you screw up, you have to start all over again.. gui debuggers are good in that sense, but they still cant be used in all situations - debugging on a remote machine say... or debugging jni code..
6. Re:One Rule For 90% of Bugs by David+Kennedy · 2004-02-24 13:53 · Score: 1
  
  [nitpick]
  
  Java debuggers can't be used on remote systems? News to me. I really must stop doing that every single day at work...
  
  If you do like logging, it's fun to start work in 24/7 environments like telecomms and finance; in my experience, no-one ever seems to consider just how big that log gets, or when it gets purged, or achived until it's on a client site or in final testing!
7. Re:One Rule For 90% of Bugs by g0_p · 2004-02-24 18:38 · Score: 1
  
  The last para wasnt specific to Java debuggers, I was referring to command-line and gui debuggers in general.
  
  About logging, I was referring to log4j used as a debugging tool, rather than as a logger. Refer to grandparent post which talks about using printfs to debug errors. I wanted to point out that log4j provides a nice and convenient way of doing it in Java. [Will save you a google search.. log4j]
  
  If you do want to use log4j as a logger in the production version, it is powerful enough that you can drastically reduce or eliminate logging by changing parameters in the config file. You can also have logger output sent to rolling log files, to a Syslog daemon or even write your own class to deal with it.
  
  In my experience, managing log files is not a big problem just so long as you account for it.
8. Re: One Rule For 90% of Bugs by gidds · 2004-02-25 16:57 · Score: 1
  
  Oh yes! I'm so glad it's not just me. People are always trying to get me to use their special-purpose methods: debuggers, and the like. But putting in display statements is by far the most universal, most flexible, most available, and most useful technique I've ever seen. It's available on everything from shell and scripting languages to assembly, from COBOL to C++ and Java, from character-based terminals to GUI workstations, on all OSs, from PDAs to mainframes, from one-file programs to multi-million-line systems.
  I find the exercise of working out exactly what displays to put in, and where, helps to narrow down the cause quicker than any debugger, and often gives you more insight into what the app is doing generally. Of course, it depends how easy it is to edit/recompile/re-run, but if that's not too slow, then it's a very productive method.
  Some of my most useful bits of general-purpose coding have been display-based debugging aids: code to display the important features of an object or network of objects tersely but exactly; code to display certain types of asynchronous event; code to highlight and display the objects involved in a GUI; code to dump state or whatever. That sort of thing comes in useful at all sorts of odd moments.
  
  --
  Ceterum censeo subscriptionem esse delendam.
Now That It's Written Down by severoon · 2004-02-24 08:13 · Score: 5, Interesting

Well, even though I think most people 'round these parts would agree with me that the book covers the fairly obvious, I will say this: it's absolutely necessary to have an "expert" write these things down because all too often, us developers try to proceed and get blocked by management. At my last job, we had a big problem with WebLogic transaction management, some bizarre confluence of events was causing a HeuristicMixedException to be thrown by the platform--by the way, WebLogic people, thanks a lot for naming this exception this way and taking the time to make sure it gets thrown in no less than six totally unrelated (as far as I can tell) circumstances. I love it when exceptions originate ambiguously, from several sources, and no one part of the platform has authority over the problem.

This was a big enough problem that we had to set up a separate, isolated environment to figure out what was going on. 4 out of the 5 architects involved on the project (no it wasn't a huge project--you can see HME wasn't the only problem here) had cemented ideas about what was going wrong...none of them agreed of course...and we had no less than 3 managers with theories based on the idea that the Earth sweeps through an aether against which all things can be measured.

The biggest issue with this testing environment was keeping everyone's mitts off of it, especially those people who didn't have to ask for permissions to the system (the architects, managers...in other words everyone). And the managers didn't agree that it was particularly important to record every step methodically, or limit the number of people making changes to the system to 1 at a time. Instead, they set up a war room and engaged in what I like to call: Fix By Chaotic Typing. (It's chaotic in the sense that, there are definitely patterns to the activity taking place, but you have to be Stephen Wolfram to find and understand them.)

Needless to say, that didn't work. If I'd had access to this book, an authority willing to put the obvious in print might have bolstered my argument that we needed to take resources OFF this issue, not add more. Alas, it was not to be. The bigwigs decided that, since the current manpower wasn't able to track down this bug, it was time to bring in the high-priced WebLogic consultants. We got some 1st generation WebLogic people, 3 of them eventually, and they came in and immediately set themselves to the task of learning our business, telecommunications. And at a mere $150/hour, why not? (Management decided the bug was non-deterministic at some point and this assembly of people was given the informal team moniker: the Heuristics team. I preferred "the Histrionics team".)

So I eventually teamed up with the lead architect on the project and we solved the problem by subterfuge. We had to intentionally set these people working in a direction--everyone, employees and WebLogic consultants alike--that was so off-the-track they actually didn't interfere with any part of the system likely containing the error. This gave us a reasonable amount of time and space to track down the bug in 3 days' time. At only the loss of 6 weeks and several thousand dollars in expenses alone for the WL consultants.

sev

--
but have you considered the following argument: shut up.
1. Re:Now That It's Written Down by Parity · 2004-02-24 08:35 · Score: 2, Insightful
  
  Actually, the 'too many cooks' problem has already been covered pretty thoroughly in The Mythical Man Month, but it does sound like this book might get a place beside MMM and be equally useful for steering managers.
  
  --Parity None
  
  --
  --Parity
  'Card carrying' member of the EFF.
The first rule of debugging by XiChimos · 2004-02-24 08:14 · Score: 1, Funny

No,

The first rule of bugs is, you do not talk about bugs. The second rule of bugs is, YOU DO NOT TALK ABOUT BUGS!

-From a memo to Microsoft's new employees
printed out and stuck to my wall by tanguyr · 2004-02-24 08:15 · Score: 1

debugging is a state of mind /t

--
#!/usr/bin/english
riiiight by Anonymous Coward · 2004-02-24 08:16 · Score: 1, Funny

"9. If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself , fix the cause, and fix the process."
Obviously, the author has never used C++Builder.
An extra rule by MythMoth · 2004-02-24 08:17 · Score: 4, Insightful

"Describe the problem to someone else."

This is so effective that it doesn't require the person to whom you're explaining it to pay attention, or even understand. A manager will do ;-) Even when the person to whom you're explaining it is smart, alert, and interested, it's almost never them that fixes the bug.

The process of describing the behaviour of the program as it ought to be versus the behaviour it is exhibiting forces you to step back and consider only the facts. This in turn is often enough to give you an insight into the disconnect between what's really happening and what you know should be happening.

If you catch yourself saying "that's impossible" when debugging some particularly freaky bit of behaviour, it's definitely time to try this.

The input of the other party is so irrelevant in this process that we used to joke about keeping a cardboard cut-out programmer to save wear and tear on the real ones...

--
--- These are not words: wierd, genious, rediculous
1. Re:An extra rule by cant_get_a_good_nick · 2004-02-24 11:48 · Score: 1
  
  The input of the other party is so irrelevant in this process that we used to joke about keeping a cardboard cut-out programmer to save wear and tear on the real ones...
  How about a cardboard cutout dog?
  
  "Debugging is always harder than coding. If you write the code as cleverly as you possibly can, then you are - by definition - not clever enough to debug it."
  --- Attr to Kernighan and Pike
Common sense strikes again by ValentineMSmith · 2004-02-24 08:20 · Score: 3, Funny
Was it just me or did anyone else get to the bottom of that bullet list and feel let down? Here I was expecting some sort of earth-shattering revelation, and all that the list shows are common sense rules. At the risk of sounding elitist, maybe this was epiphany for someone else. Lord knows it would be a revelation in our QA department, where the list consists of exactly one rule:
1. -Grab a programmer
But still... Someone made money off of that? Heck, look for my new book next week, "Walking to Peoria in 3,976,554 steps", with each step being "Place your rear foot 1.5 feet in front of your front foot."
--
Karma: Chameleon - mostly influenced by bad '80s New Wave music
1. Re:Common sense strikes again by dkirchge · 2004-02-24 12:18 · Score: 1
  
  Just remember: common sense isn't [common].
Unit Test? by MooseByte · 2004-02-24 08:23 · Score: 4, Funny

What is this "unit test" you refer to? If we consider our customer base to be a "unit", does that count?

Yours Truly (All Belongs To Me),

Bill
Fresh view: visit next lower level of abstraction by Flexagon · 2004-02-24 08:27 · Score: 4, Insightful

A good list. As part of rule 8, it's often extremely helpful to look at the problem from a different level of abstraction than one normally would (e.g., different than you coded, or that you best understand it). This often exposes false assumptions that may be blocking a proper analysis.

Successful debugging is a lot like any hard science, particularly if you are not, and cannot, become familiar with the entire system first. Your "universe" is the failing system. You develop hypotheses (failure modes and potential fixes) and run experiments (test them). You have solved the problem only if you completely close the loop (your fix worked, it worked in the way you expected, your hypothesis completely explains the circumstances, and peer review concurrs).

A big part of the "art" is cultivating an attitude of how systems are stressed, and how they may fail under those stresses.
Re:Remain focused. Don't let others' WAGs get to y by RobinH · 2004-02-24 08:30 · Score: 4, Interesting

I find that when troubleshooting systems with which other people have worked longer, I have had better luck just asking them simple facts and troubleshooting myself rather than listening to their wild-ass guesses and having to shoot them down.

Yes, but within their guesses are sometimes tidbits of information. Last week we had a complaint from a user that every time they clicked this one button on a form, it set off a certain process that wasn't supposed to happen right then, but we knew that there was no connection between that click event and the process. However, I knew he wasn't imagining it.

After investigating, I found that when he opened the form that the button was on, it loaded a timer object that started ticking away, and 5 seconds later initiated the process. Just happens that it takes about 5 seconds from opening the form to click on the button.

Of course, if I'd written the software... well, whatever.

--
"I have never let my schooling interfere with my education." - Mark Twain
casting the runes has worked! (Jargon file) by dwheeler · 2004-02-24 08:39 · Score: 4, Interesting

Indeed, casting the runes has been a successful debugging technique before.
Here's the story from the Jargon File (under "casting the runes"): "A correspondent from England tells us that one of ICL's most talented systems designers used to be called out occasionally to service machines which the field circus had given up on. Since he knew the design inside out, he could often find faults simply by listening to a quick outline of the symptoms. He used to play on this by going to some site where the field circus had just spent the last two weeks solid trying to find a fault, and spreading a diagram of the system out on a table top. He'd then shake some chicken bones and cast them over the diagram, peer at the bones intently for a minute, and then tell them that a certain module needed replacing. The system would start working again immediately upon the replacement."

--
- David A. Wheeler (see my Secure Programming HOWTO)
I'm an expert at debugging... by DrCode · 2004-02-24 08:40 · Score: 1

...because I get so much practice.
Actually, the book has that one by dwheeler · 2004-02-24 08:43 · Score: 1

I agree, explaining it to someone else is a really good way to debug things. It's discussed in the book, it's just not identified as a "rule" per se. You're probably right, it should've been elevated to a rule or subrule. The book does have a funny story that one group used a mannequin and made everyone explain their inexplicable problems to the mannequin (same idea as the Teddy Bear). And yes, it worked for them, too!

--
- David A. Wheeler (see my Secure Programming HOWTO)
BOfH by Archangel+Michael · 2004-02-24 08:45 · Score: 1

Sounds like rules from Bastard Operator from Hell.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
That's only partly true by melted · 2004-02-24 08:49 · Score: 1

It's funny how much time a typical tester spends testing the code that's OK already (has been debugged/fixed) while not touching the bugs outside of the scope of regression test suite. I would mandate ad-hoc testing with periodical (two to three times during the product cycle, basically at two last milestones and in the final version) regression testing. Not the kind of "we've changed this from int to uint, execute your 10000 test cases" kind of stuff I'm seeing way too often. Testers get bored and become dumb if you make them do it two dozen times, thus they miss the higher-level bugs.

Another thing that's totally wrong with testing is breaking down the application by areas in any kind of strict fashion, especially with things like security or performance. This fucks up the most important thing in the app - namely, how well the pieces work together. Individual features may be OK, but there's a lot of unexplored garbage at the seams, and sooner or later seams start to break revealing fundamental problems and total lack of testing in feature interaction.
1. Re:That's only partly true by Ben+Hutchings · 2004-02-24 10:09 · Score: 1
  
  Testers get bored and become dumb if you make them do it two dozen times, thus they miss the higher-level bugs.
  
  Why do you use humans to do what computers should be doing? Tests should be automated as far as possible, so that mindlessly repetitive testing is cheap and accurate. Obviously humans are still useful for creative prodding of the UI, but any problems they find that way should get automated tests.
  
  Another thing that's totally wrong with testing is breaking down the application by areas in any kind of strict fashion, especially with things like security or performance. This fucks up the most important thing in the app - namely, how well the pieces work together. Individual features may be OK, but there's a lot of unexplored garbage at the seams, and sooner or later seams start to break revealing fundamental problems and total lack of testing in feature interaction.
  
  That's why you do system testing too - unless what you're saying is that system testing still tends to test one area at a time.
2. Re:That's only partly true by mark99 · 2004-02-24 10:32 · Score: 2, Interesting
  
  Lots of point there to answer...
  
  Tests of the 10000 test case sort should be automated, then boredom is not an issue.
  
  I've written and maintained quite a few largeish (>10000 line) programs in my life, and those with extensive automated test suites (all right, there were only two of those :), were much nicer.
  
  By the time a program gets large (and sucessful ones usually do (it is amazing how many unsucessful ones do too)), the way the "pieces work together", and "optimal architectures" are no longer an issue. I like Fowlers statement "Architecture is about decisions that are hard to change later".
  
  Brand new shiny programs based on brand new shiny ideas are always more interesting than the old ones, just like everything else in life. But by the time they have a "complete feature set", they are old and wrinkly.
  
  Admittedly Regression Testing is often not an option. There usually seem to be tests that are hard to automate, and they just don't get automated. And bugs occur there (maybe mostly there). I haven't seen much in the way of Automated Regression Testing, except in the NUnit project.
Re:Hardware *Debugging*? by Anonymous Coward · 2004-02-24 08:55 · Score: 0

I daresay this is the first "-1, Informative" post I've ever seen :-)
Re:Windows *Debugging* by marklark · 2004-02-24 08:57 · Score: 3, Funny

Windows Debugging Steps:

1) Re-boot.
2) Re-install.
3) Re-format, Re-boot, Re-install.
4) Re-peat. ;^/
"Thats a feature" by peter303 · 2004-02-24 09:08 · Score: 2, Insightful

I am not surprised at the number of so-called bugs that turn out to be holes in the specifications or tests. Then I tell the complaintant "thats the design specification". Then they say "no, thats not" and give me the updated specification.

In fact, popular bug-tracking databases like Scopus usually merge bugs and enhancement requests together, due to this ambiguity.
See "Diagnosing Java Code" by Anonymous Coward · 2004-02-24 09:16 · Score: 0

See Eric Allen's "Diagnosing Java Code" at IBM's Developerworks for lots of debugging info (& not just for Java).
favorite quote: discovery of debugging by Anonymous Coward · 2004-02-24 09:21 · Score: 5, Informative

My favorite quote on the subject of debugging:

As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.

-- Maurice Wilkes, 1949

55 years later, programmers are still spending a large part of their lives finding bugs and fixing them...
1. Re:favorite quote: discovery of debugging by JPZ · 2004-02-24 11:42 · Score: 4, Funny
  
  As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.
  
  -- Maurice Wilkes, 1949
  
  55 years later, programmers are still spending a large part of their lives finding bugs and fixing them...
  Even worse, they're probably still spending time finding and fixing bugs this Wilkes guy introduced...
2. Re:favorite quote: discovery of debugging by Dumbush · 2004-02-24 17:05 · Score: 1
  
  wait a minute, "debuggin" exist 1949?
  
  Isn't the word debugging invented back in around the 50s or 60s, when "technicians" are required to clean bugs out of vaccum tubes?
Best way to debug - *ever* by Anonymous Coward · 2004-02-24 09:47 · Score: 2, Informative

Use Purify (or an equivalent) before you deliver your code.
At about $1200 a seat for WinXX, that's about a single day's worth of productivity from a coder. (I'm counting all overhead here).
How many times have you spent days or even weeks looking for that one elusive memory overwrite causing your Heisenbug? Memory-checking tools like Purify find those bugs before they cause failures!!!
1. Re:Best way to debug - *ever* by Anonymous Coward · 2004-02-24 09:49 · Score: 0
  
  And no, I don't work for IBM.
2. Re:Best way to debug - *ever* by soft_guy · 2004-02-24 11:13 · Score: 1
  
  I'd have to agree. Purify and Pure Coverage are both really great tools. I highly recommend them.
  
  --
  Avoid Missing Ball for High Score
3. Re:Best way to debug - *ever* by Anonymous Coward · 2004-02-24 11:43 · Score: 0
  
  I'd love to. I really needed to run them today, but our QA department, which is supposed to run the tests weekly, had trouble with the Purify builds failing and never told us. And of course the guy who set them the builds originally has been laid off for a couple of months now.
Phase of the Moon by cpeterso · 2004-02-24 09:51 · Score: 3, Interesting

There really was a bug based on the phase of the moon. See the Jargon Dictionary for more info: phase of the moon:

phase of the moon phase of the moon n. Used humorously as a random parameter on which something is said to depend. Sometimes implies unreliability of whatever is dependent, or that reliability seems to be dependent on conditions nobody has been able to determine. "This feature depends on having the channel open in mumble mode, having the foo switch set, and on the phase of the moon." See also heisenbug. True story: Once upon a time there was a program bug that really did depend on the phase of the moon. There was a little subroutine that had traditionally been used in various programs at MIT to calculate an approximation to the moon's true phase. GLS incorporated this routine into a LISP program that, when it wrote out a file, would print a timestamp line almost 80 characters long. Very occasionally the first line of the message would be too long and would overflow onto the next line, and when the file was later read back in the program would barf. The length of the first line depended on both the precise date and time and the length of the phase specification when the timestamp was printed, and so the bug literally depended on the phase of the moon! The first paper edition of the Jargon File (Steele-1983) included an example of one of the timestamp lines that exhibited this bug, but the typesetter `corrected' it. This has since been described as the phase-of-the-moon-bug bug. However, beware of assumptions. A few years ago, engineers of CERN (European Center for Nuclear Research) were baffled by some errors in experiments conducted with the LEP particle accelerator. As the formidable amount of data generated by such devices is heavily processed by computers before being seen by humans, many people suggested the software was somehow sensitive to the phase of the moon. A few desperate engineers discovered the truth; the error turned out to be the result of a tiny change in the geometry of the 27km circumference ring, physically caused by the deformation of the Earth by the passage of the Moon! This story has entered physics folklore as a Newtonian vengeance on particle physics and as an example of the relevance of the simplest and oldest physical laws to the most modern science.

--
cpeterso
1. Re:Phase of the Moon by QuestionsNotAnswers · 2004-02-24 22:51 · Score: 1
  
  A phoneline in a small coastal town went dead every month. The engineer finally clicked that it was based on the phase of the moon: The tide is slightly higher for the full moon, and seawater was seeping into a junction box with a faulty seal on the cable connection. I would have loved to see the AHAH moment on his face!
  
  --
  Happy moony
Can't be any good... by Anonymous Coward · 2004-02-24 09:54 · Score: 0

it's only 192 pages and cost < $20USD :)
Re:Heisenbugs...and the implications of malloc() by Anonymous Coward · 2004-02-24 09:55 · Score: 0

Always remember - malloc() (and all its kin) "returns a pointer to a block of at least size bites suitably aligned for any use"
That "suitably aligned for any use" is critically important - it means that malloc has to return memory in blocks of 8 bytes (on most systems). That means that when you ask for 1 byte, you really get 8 and have 7 bytes of slop to overwrite before you really have a problem.
But if you ask for 8 bytes you have no room for any error...
Quick Correction by composer777 · 2004-02-24 10:10 · Score: 1

For whatever reason, slashdot's code eliminates the '&' that I put in front of x and y, along with both x and y, for the 3rd and 4th cases. So, just keep in mind that there should be a '&' in front of y when I assign c = y, and in front of x when I assign d = x;
int a = 0;
int *b = NULL;
int **c = NULL;
int *d = NULL;
{
int x;
int *y;
y = malloc(sizeof(double));
a = x; //this is fine, we're just passing data
b = y; /*this ok too, y's block of memory is
allocated out of the heap*/
c = y;/*WRONG!! y is allocated
locally, remember, we're talking
about the address of y, not the
adress y is pointing to in this
case, the adress of y, like the adress
x, is referrencing locally allocated
data*/
d = x;/* this is also wrong, since the adress
of the data containing x, is pointing
to memory that is pushed on the
function stack*/
}
1. Re:Quick Correction by Anonymous Coward · 2004-02-26 12:05 · Score: 0
  
  If you posted in HTML mode, you should have used "&" (no quotes, obviously).
Automated testing is overrated by melted · 2004-02-24 10:15 · Score: 0

If it takes more time to write automated test than to execute the test case 5 or 6 times - fuck the automation. Again the main problem with it is that it will test the code that's already "good" without going to unexplored code paths. This gives a tester (and a developer) a false sense of security. Automation suite passes, so there are no more bugs, right? WRONG!

I'm not saying automation is useless. There are certain tests that are nearly impossible to do without automation, I'm just saying that most of the time cost outweighs the benefits.
1. Re:Automated testing is overrated by Ben+Hutchings · 2004-02-24 13:59 · Score: 1
  
  If it takes more time to write automated test than to execute the test case 5 or 6 times - fuck the automation.
  
  Maybe we're talking at cross-purposes here. I don't see how it can take that long to write a test case, but then my main experience is with unit tests rather than systems tests. If you're talking about testing systems with complicated UIs here then I see that that may be difficult to automate. Nevertheless there are tools that can help with this.
  
  Again the main problem with it is that it will test the code that's already "good" without going to unexplored code paths.
  
  I don't think that's a problem; it frees up the human beings to work on better coverage.
  
  Automation suite passes, so there are no more bugs, right? WRONG!
  
  No amount of testing can prove the absence of bugs. So this attitude is always wrong.
  
  I'm not saying automation is useless. There are certain tests that are nearly impossible to do without automation, I'm just saying that most of the time cost outweighs the benefits.
  
  To me the cost seems pretty minor. They let you catch bugs earlier and make it easier to identify the source.
2. Re:Automated testing is overrated by melted · 2004-02-24 14:22 · Score: 1
  
  >> No amount of testing can prove the absence of bugs.
  >> So this attitude is always wrong.
  
  I wish I could tell this to our PHBs. Basically what they do when it's time to ship the product is they throw the remaining bugs over the fence and say that the product has "zero bugs". How's that?
10. Compare with one that works. by Anonymous Coward · 2004-02-24 10:27 · Score: 0

I can't believe he didn't add that one.

I cannot tell you how many times I've found problems simply by comparing with something similar which works, and changing the thing that works to be more like the thing that doesn't work one step at a time until it quits working. E.g. Back up io previous working versions by CVS, and start applying changes until the problem reappears. Sometimes this is easier than actually debugging the problem, especially if lots of people are working on something and the code is stuff you aren't that familiar with. You can often narrow the problem down to a small diff, and the problem will just be plainly visible, or if it isn't plainly visible, at least you have a good clue that might stimulate other areas to look.

Also works for debuggging network hardware problems. Or finicky jet ski engines.

It's related to "change one thing at a time" I suppose, but not quite the same.
oops.. by bbowers · 2004-02-24 10:37 · Score: 1

"63,000 bugs in the code, 63,000 bugs,
Ya get 1 whacked with a service pack,
Now there's 63,005 bugs in the code!"
-- from a Slashdot post

hehe...how does one solve this problem?

--
Even a stopped clock gives the right time twice a day.
Buffer overflows, bad pointers, stack problems... by jtheory · 2004-02-24 10:49 · Score: 1

Wow, I'm glad I'm coding in Java.

Even when garbage collection and VM-managed pointers meant significantly slower performance (not anymore in most situations), it was worth it.

And I sat down years ago and learned Java threading inside and out, and I always manage thread communication very carefully (which is easy - define your thread boundaries well, and there's not much code at risk)... and voila, no Heisenbugs.

I actually like debugging nowadays. It's like being a storybook detective -- you always get your man.

--
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
Wouldn't you take the car to a mechanic? by khasim · 2004-02-24 10:58 · Score: 1

You: (insert long story about ice cream and driving)

Mechanic: (translation mode ON) So, you drive to the store and when you get back in your car, sometimes it won't start?

You: Yes (insert long story about ice cream)

Mechanic: (translation mode ON) Sounds like vapor lock. Let me take a look at the carb.

Moral of this story: Good techs can extract the relevant facts from the fluff and quickly diagnose the real problem.
1. Re:Wouldn't you take the car to a mechanic? by kzinti · 2004-02-24 11:17 · Score: 1
  
  Too bad you omitted the story - it's a good one. This is a Car Talk story, isn't it? The car would start when the guy bought chocolate ice cream, but it wouldn't start if he bought vanilla. Turns out the two are in different parts of the store, one much further away. If he had to walk further for the vanilla, the car wouldn't start because it had sat long enough to develop vapor lock.
2. Re:Wouldn't you take the car to a mechanic? by Big+Smirk · 2004-02-24 15:07 · Score: 1
  
  Unfortunately the average mechanic isn't so bright.
  
  Car needs a few things to run. Good mechanical condition (compression), fuel, air, and spark. If the car won't run (assuming battery is good and the motor turns on the starter) one is guaranteed to be missing (or way out of spec). Unless the car magically fixes itself, you can pretty much rule out mechanical or air. That leave spark or fuel. Spark can be measured pretty easily and on old cars, a quick kick of the throttle would squirt gas from the accelerator pump (for those that remember a carberator).
  
  Any tech/mechanic who assumes vapor lock is an idiot and will end up costing you extra money while they replace parts in vain trying to make it work.
  
  --
  TODO: create/find/steal funny sig.
Speaking of Debugging... by jason777 · 2004-02-24 11:00 · Score: 0

Slashdot needs to do some.
If I click the Read More link, no forum items appear after the text. If I change my threshhold, I still see no items. I have to go back to the Slashdot front page, and then click the link beside the Read More link in order to see the responses.
Two of my favourite rules... by rumblin'rabbit · 2004-02-24 11:01 · Score: 3, Insightful

Find the simplest possible run that replicates the error. My favourite strategy. It's really worth while doing this. Related to rule 4, perhaps, but not the same thing.
Examine the input data. Often it isn't a bug. Often the program is doing an entirely reasonable thing given the input data. Or perhaps the program mishandled bad input data (in which case it is a bug, but now you know what to look for).
how about this bug in the msvc debug libraries by ZINGYWINGY · 2004-02-24 11:10 · Score: 0

I don't remember the details exactly but we were testing a server for several weeks that was built on some version of the msvc++ debug mfc libraries. They put in some kind of code to track memory allocations numerically. So, let's say you wanted to tell the debugger to stop on the 3238th "new" -- somehow you would set this up using some precompiler defines and it would count up all the memory allocations until #3238 and it would break into the debugger right then. Well, folks -- guess what happened when that counter overflowed (after a few weeks of my program running)? You guessed it:

This program has performed an illegal operation and will be shut down.

With me saying "wtf" and having no idea what's going on except that my program just wants to die after it's been running for too long.

since /. is the place to do it, and I'm anon, I'll cast aspersions -- what do y'all think, was their convenient forgetting to check for overflow on this integer a deliberate mistake, since you're not supposed to (because of license) ship a product (especially a server) linked to their debug libraries? Kinda gets in the way of testing though...
1. Re:how about this bug in the msvc debug libraries by Tony-A · 2004-02-24 13:02 · Score: 1
  
  guess what happened when that counter overflowed (after a few weeks of my program running)?
  This program has performed an illegal operation and will be shut down. ... was their convenient forgetting to check for overflow on this integer a deliberate mistake,
  
  OK, I'll take a few stabs at it.
  It's another version of the Y2K phenomenon. The basic answer to Y2K is 99+1=00 with no excitement.
  Assuming the obvious x86 binary integers, incrementing the counter is a non-problem. The problems comes when something decides it needs to get excited because adding one to a positive number created a negative number (or adding one to a positive number made a zero). What makes it hilarious is that it is the checking for overflow that causes it to bomb.
It's not that hard... by Gorimek · 2004-02-24 12:04 · Score: 1

Here's what I've always done.

1. Figure out what the code should do.
2. Figure out what it does.
3. Figure out where 1 and 2 diverges.

If you have reproducible error, you really don't need any great skills or brilliant thinking to find and fix a bug this way.
1. Re:It's not that hard... by BigBadBri · 2004-02-24 12:23 · Score: 1
  
  But here, as in many areas, it is the approach that is the skill - finding the right way to look at a problem, so that you can see where the problem comes from, is often the key to fixing anything, be it software, hardware or even a car engine.
  You're right - there's no voodoo involved, but I bet that if you analysed your approach more deeply, you'd see that you use some of the methods outlined, because they are all commonsense.
  Given the number of impractical idiots out there, a book that teaches commonsense methods is worthwhile, even if you or I won't need it.
  
  --
  oh brave new world, that has such people in it!
Why is this moderated FUNNY? by NotQuiteReal · 2004-02-24 12:10 · Score: 1

Those exact rules are in my employee handbook!
This is Informative!

--
This issue is a bit more complicated than you think.
Missing Step 10 by 680x0 · 2004-02-24 12:29 · Score: 1

10. Make sure no-one else has unplugged the piece of equipment you'd been using earlier.
I usually test new code from my desk, but the equipment is in the lab where it's (usually) connected via the LAN.
GAAAAAAHHHHH!!!!
Sequential errors by CactusCritter · 2004-02-24 12:59 · Score: 1

One of the most confusing bugs I ever encountered was when I determined the obvious fix and then got the same conditions as before.

The fix looked good and showed as expected in a dump. I finally determined, by careful code checking, that there was another error downstream which produced exactly the same results. The second fix resolved the matter.
1. Re:Sequential errors by statusbar · 2004-02-24 15:59 · Score: 1
  
  Or even better - The sequential errors that cancel each other out.
  
  Browsing through code you see an incorrect assumption. You fix the problem. New bugs appear because other code was also making the same incorrect assumption.
  
  --jeff++
  
  --
  ipv6 is my vpn
A computer wipe out by CactusCritter · 2004-02-24 13:11 · Score: 2, Interesting

This is a very ancient experience which occured on an IBM704 noticeably before there was any such thing as an operating system.

I had written an application which, when run, wiped 32K of RAM words clean with the same image in every word. I had to get after-hours computer time and proceeded to insert a print statement following by an abort in the source code (Fortran) in order to track the problem down.

Turned out to be an input error. The input utility I was employing used the character in column one of the data cards to indicate the type of data.

I finally determined that my assistant, who had prepared the input data for my test case had put a 3 instead of a 4 in the first column of a data card resulting in what would have been a quite reasonable integer value becoming a floating point number in the machine.

That resulted in a very long-running loop which stored the same number in every word of the machines RAM.

Thereafter, every appplication that I wrote had an input data check for every integer value to ensure that its value did not exceed the source code dimension of the array into which data were to be written.

This was Fortran around 1960. The Ur days!
1. Re:A computer wipe out by meburke · 2004-02-24 16:21 · Score: 0
  
  Ahh, those old systems.
  
  I used to do cryptology on an IBM 1401 with 8k of memory. I signed up on a list of techs available for help on IBM systems. I got a call one night from a local business service bureau, saying that their 1440 was simply shutting down. When I arrived, they showed me how they would run the bootstrap and load decks, then the patches and then data decks, but after running for a couple of minutes the system would simply stop working and all the console lights would go off even though the system still had power.
  
  I found the problem in the first step, where I tried to understand what was going on. Autocoder (the IBM assembly language) was supposed to clear a value to a "word mark" located at a specific place in memory, but the data clerk had mis-typed the card in a patch and the word mark wasn't in the correct place, so it kept clearing memory until it went full circle and killed the system. I found it by accident, not logic, but I still got paid almost 5 times my monthly salary as a PFC for about 45 minute's worth of work.
  
  Looking at the steps, I see how the rules would have made it easier for me to hunt down the problem, but it probably would have taken longer.
  
  I'm a firm believer in a troubleshooting/debugging system. I learned my best troubleshooting techniques from a programmed instruction course by Phillips Electronics in Viet Nam in 1967. I have loads of troubleshooting and problem-solving books on my shelf, including some great books on debugging. I have a great book by James Martin on "provably correct" code, but I've never gotten the hang of it. (I need a book on provably correct typing) These days, I'm usually troubleshooting bigger processes, and I've never found a better troubleshooting system than the Kepner-Tregoe process ("The Rational Manager"), although the Goldratt Institute ("It's Not Luck", by Eli Goldratt) has a good one. Since most people don't want to study the process of problem-solving to any great depth, I recommend "The Complete Problem Solver" by Hayes. (There is a book with a similar title by Arnold, but they are on slightly different levels.)
  
  Mike
  
  --
  "The mind works quicker than you think!"
Well, there's a scientific explanation for that... by DrMorpheus · 2004-02-24 13:20 · Score: 1

It's because you, and I, radiate Murphons. Or to take the classical physical approach, we radiate a Murphonic field.
Murphons are failure particles that radiate from certain individuals. They were named after Captain Murphy of "Murphy's Law" fame.

--
Debunking the "59 Deceits"
Give me a break by rumblin'rabbit · 2004-02-24 13:27 · Score: 2, Interesting

Yeah, right. Perhaps if you are debugging a small application you wrote yourself then it may be that easy. Debugging a major application written by someone else, who used perhaps a less than optimum style and methodology? Forget it. The above won't work because you won't be able to figure out #1 and you won't be able to figure out #2 in less time than it takes to rewrite the application from scratch.
Don't know what planet you're programming on.
Rule 9 by VirtuaKnight · 2004-02-24 13:46 · Score: 1

Sometimes things can manage to "fix themselves" simply because of problems that started outside of your scope of control. Not too long ago, I was writing some code for my own spin on phpBB, called OmniBB. One night, I was getting all sorts of error pages from the server (Apache ran on Windows). I finally gave up and went to sleep, with visions of computer bugs dancing in my head. The next morning, I tried it again, and lo and behold, it worked! The problem might not be with your code, but a tool that your code relies on. In cases like this, it wouldn't help me to fix the tool (Apache) or get a different one, since it wouldn't be very useful if it didn't work on Apache servers, would it?
Because debugging should be avoided. by Chemisor · 2004-02-24 14:20 · Score: 1

> beginning programmers spend more time debugging code than they do writing code,
> so why isn't that activity stressed?

Because you should know what your code is doing. If you can not figure out which piece of code is failing you either have a bad design, where the division of labour is inadequate, or an inadequate error reporting mechanism. When I have a bug and I find the cause, I always try to figure out how the error reporting could be improved. Throw those exceptions, put in those asserts, and print clarification messages for errors whenever you can (probably with an #ifndef NDEBUG around the last). The goal is to be able to diagnose any bug simply by looking at the error message, which I can already do in most cases thanks to the aforementioned practices.
BLOCK by Anonymous Coward · 2004-02-24 14:21 · Score: 0

M'P```04!`0$!`0$```````````$"`P0%!@<("0H+_\0`M1``` @$#`P($`P4%
next block
But restart is the fix-all by girgit · 2004-02-24 14:30 · Score: 1

Especially those Windows admins/techs who think 'restart' is the ultimate fix-all.

Didn't you know? A restart cleanses the soul of the computer and therefore the problems go away.
Same topic, slightly different rules by nlper · 2004-02-24 14:35 · Score: 1

Several posters have pointed out useful rules that didn't make the book authors list, such as comparison with a known good example or explaining the problem to someone else.

It's also worth noting that some of this terrain has also been codified into the "Universal Troubleshooting Process" here.

Tyler
1. Re:Same topic, slightly different rules by raygunz · 2004-02-25 05:18 · Score: 1
  
  Both of the above useful rules are in the book, just organized under other rules (Change One Thing at a Time and Get a Fresh View)
  
  --
  "Debugging" by Dave Agans - the perfect gift for your favorite imperfect engineer.
AIX crash/hang by Anonymous Coward · 2004-02-24 14:36 · Score: 0

Create semphore lock/unlock routines that ALSO create/destroy the semaphores, write a for() that loops a million times on these locks/unlocks over and over and then run 3 of these proceses.

No one likes the idea of haiving to hit the big red switch to bring your IBM AIX system back. I should seriously publish this code to IBM, but do you really think they care very much for their SCO licensed AIX OS rite now? :)
Take adive from Feynman by girgit · 2004-02-24 14:39 · Score: 1
Try The Feynman Problem Solving Algorithm
1. Write down the problem.
2. Think very hard.
3. Write down the solution.
It has worked for me many many times
Re:Buffer overflows, bad pointers, stack problems. by abdulla · 2004-02-24 14:39 · Score: 1

You can always use a garbage collector in C/C++, or use smart pointers in C++. I guess that still doesn't solve misuse of allocated data. I don't know Java but I'm guessing it's possible to over run an array in it as well, correct me if I'm wrong.
Squish! by Anonymous Coward · 2004-02-24 14:45 · Score: 0

There's a bug!

55 years later, programmers are still spending a large part of their lives finding bugs and fixing them...
Try "65 years later..."

the off-by-10 bug strikes again!
1. Re:Squish! by Anonymous Coward · 2004-02-24 19:02 · Score: 0
  
  2004 - 1949 = 55
Re:Hardware *Debugging*? by Bob9113 · 2004-02-24 14:46 · Score: 1

You don't "troubleshoot" a circuit design. You debug it.

Actually, I just started designing analog circuits about a month ago. I can tell you with total confidence that my technique is still distinctly troubleshooting: hmm... I wonder if changing these two capacitors from tantalum to polyester film will eliminate the buzz... hack hack hack.

--
Stop-Prism.org: Opt Out of Surveillance
Heh by jtheory · 2004-02-24 14:49 · Score: 1

You seem to have screwed up your own argument. Here, let me help out.

You might have discussed "fools who thank their technology for limiting their control over the computer" or something along those lines. Then you might have had a leg to stand on, because of course there are trade-offs when you give up explicit control over your pointers, for example (a point that I touched on).

But I'll eat my keyboard if you can find me one programmer who scorns a technology purely because it limits their errors.

--
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
you you you! by Anonymous Coward · 2004-02-24 14:58 · Score: 0

YFI!
Missed one by dmiller · 2004-02-24 15:13 · Score: 2, Insightful

There is one that appears to be left out (from the summary, perhaps not from the book - I haven't read it): fix it everywhere.

Once you have found a bug, search the rest of your tree for similar bugs. Chances are that you will find and fix several. This is especially true of bugs caused by bad assumptions.

FYI: This is one of the central audit methodologies of the OpenBSD project. It works much better for the BSDs as they keep the entire system in one CVS tree, rather than scattering it around FTP servers in the forms of tarballs. The whole system is readily available to search for entire classes of bugs.
Re:oops.. variant by Anonymous Coward · 2004-02-24 15:21 · Score: 0

"65,535 bugs in the code, 65,535 bugs,
Ya get 1 whacked with a service pack,
Now there's 2 bugs in the code!"
Awesome Review by rossy · 2004-02-24 15:35 · Score: 1

Thanks... I am in the middle of debugging a problem with some hardware, trying to track down a lost 400ps of time. Been working on this for the last 4 days... The nine rules apply to software & hardware systems IMHO. Was pondering theories earlier yesterday... should have been reporting symptoms! I'd add one personal favorite. Usually, if you blame the system or the compiler, or some other "Monolithic" entity its because its your fault, it will just take you longer to find it, after you dig out from under your embarassment for blaming the "system". So... I always assume I personally hosed up until I can prove otherwise. That way I'm plesantly surprised if its someone elses bug.

--
Ross Youngblood
NUMBER 5 IS *ALIVE*!!! by Anonymous Coward · 2004-02-24 17:07 · Score: 0

and objects to being debugged.
No break for you! by Gorimek · 2004-02-24 17:10 · Score: 1

I find your statement a bit baffling.

To even know that there is a bug, you need to know what the code is supposed to do. How else are you going to know it doesn't do it right? And you also need to know what it actually does. How else are you going to know that what it does isn't right?

How you would attempt to rewrite an application from scratch that you don't know what it's supposed to do is also something I find hard to fully buy in to.

Not that I've invented anything special or anything. It's just common good problem solving practice. In terms of the book rules, I focus on 3, 4, 5, 9, and occasionally 7.

Sure, it can take a lot of time and work sometimes. Days or even weeks. Some things are complicated and take time to do. But I fix my bugs for real. And that, sadly, makes me a fairly small minority among todays programmers.

I've been programming professionally on planet earth for 20 years on both big and small applications, good, bad and awful designwise.
Re:Heisenbugs...Hardware hell. by Anonymous Coward · 2004-02-24 17:59 · Score: 0

"And don't forget the phase of the moon, or for the truly unlucky, intermittently glitchy hardware."

Tell me about it. I had purchased a high-end SVHS VCR about 13 years ago. When it worked it was a wonder to behold. "When" being the operative word. When I had it at home it would intermittently spin it's motors. Click a few times, the raise the loading table and buzz loudly. This was with no tape in the machine. It also would occasionally damage tapes. repeated trips to the shop, and even the manufacturer didn't resolve the issue. I last took it to a shop that "claimed" that the heads were shot and they wanted a lot of money to fix it. i had had it at this point and put it in my attic and bought a new one. my current one is about seven years old and beyound the usual maintenance, has worked well. I think I'll get one of those DVD recorders next.
Forgot Rule #0! by maysonl · 2004-02-24 19:24 · Score: 1

Don't insert the bugs in the first place; you won't then be forced to spend vast amounts of time taking them out.
Corollary 1: Use a garbage-collected, type=safe language (there go a huge mass of bugs), whenever possible.
Corollary 2: Code defensively - if your routine complains the first time it's called with garbage input, you won't have to look at the output of 20 or 30 runs before somebody notices that something's fishy.
Corollary 3: Write modular code, with clean and minimal interfaces, i.e. K.I.S.S.
The most important thing to prevent bugs by igomaniac · 2004-02-24 21:03 · Score: 2, Insightful

I have a lot of experience in finding and fixing difficult bugs. In my experience, the most important thing you can do is when you find a bug, stop and think how you could have caught this bug automatically. If you practice this policy, you end up with very solid code. Basically, in the debug build, no function should ever crash the program no matter what garbage you put in the parameters - it should report an error and stop.

I think writing solid code is all in the attitude of the programmers - I had one guy who had a memory overwrite bug that was corrupting some characters in his string table when he called a certain function. Do you know how he fixed it? He wrote some code that put the right characters back over the corrupted ones after the call to this function!!! If you have that attitude, things WILL blow up in your face...

--

The interactive way to Go -- http://www.playgo.to/iwtg/en/
Re:Windows *Debugging* by maxwell+demon · 2004-02-24 21:14 · Score: 1

You forgot:

5) ???
6) Profit!

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Buffer overflows, bad pointers, stack problems. by oops · 2004-02-24 22:42 · Score: 1

If you over or under run an array you get an ArrayBoundsException thrown, and the read/wrtie doesn't occur, so no corruption occurs (except your operation failed)
The hammer is your friend by QuestionsNotAnswers · 2004-02-24 22:55 · Score: 1

There is something sublime about knowing when to use a big hammer to fix something. I fixed the oven door by hitting it hard enough near the hinge to fix the problem of a slightly worn part - oh so satisfying! Of course I wouldn't admit to any times I might have used a hammer and completely screwed something up!

--
Happy moony
Grab the Brass Monkey by zero_offset · 2004-02-24 22:56 · Score: 1

What the hell does "grab the brass bar" mean, anyway???

--
Slashdot quality declines as the number of hot grits posts decreases. - Provolt's Law, Apr-09-2005
Re:Windows *Debugging* by Anonymous Coward · 2004-02-25 00:46 · Score: 0

And if none of those work, try
5) Re-dhat
how to learn to debug any code by dario_moreno · 2004-02-25 02:29 · Score: 1

just teach programming courses for a few years ! Then you will learn to think of every possible cause for a bug, because else you only are used to look for bugs coming for mistakes you usually make (say, loops going from 0 to n on an n sized array), not for mistakes you assume you would never make...and that you actually make. When you look to someone else's code (or electronic circuit, or any kind of experiment for that matter), you realize that you have to have this particularly open state of mind to find bugs, especially under this pressure that students are very good to put (like : OK, I program like an idiot, forgot 90% of what you explained, but I expect you to find bugs in less than one second in this 1000 lines kludge because I need to pass the class ). Of course, a good knowledge of all classic mistakes, debugging tools and switches (array bound checking like pointed above, post mortem core dump analysis, stack printouts after segv, tests with known results, check with various optimization flags - compilers can be buggy) also helps.

--
Google passes Turing test : see my journal
Dear baffled... by rumblin'rabbit · 2004-02-25 03:44 · Score: 1
The trouble with your advice is that it's useless. Before you begin "Figuring out what the code is supposed to do" you have to figure out what code. In a 25,000 line application, where do you begin? What classes or functions are causing the problem? What lines of code?
And I didn't say you should rewrite the application from scratch. I said it would take someone that long if they used your "approach".
Here's how to win the Master's. For each swing...
- Figure out where you want the ball to end up.
- Hit it there.
I'm sure Mike Weir will be grateful for this advice. Then again, maybe not.
OT, but ... by josephgrossberg · 2004-02-25 05:10 · Score: 1

Love the .sig!

It's like anti-Japanese protectionism vs. anti-Mexican protectionism.

--

Joe
http://www.joegrossberg.com
1. Re:OT, but ... by josephgrossberg · 2004-02-25 05:16 · Score: 1
  
  oops
  
  that should be "meets" not "vs."
  
  --
  
  Joe
  http://www.joegrossberg.com
Debugging by email by gidds · 2004-02-25 16:25 · Score: 1

It works! And not just for debugging, too -- I find a similar technique works well for design decisions and other knotty problems.
I tend to explain things in an email (it helps if it's to a real person, though you don't need to send it when you're done). Sometimes I'll end up rewriting the entire mail in the process, so that all that's left is the solution; but even if not, it usually clarifies and limits the problem.
My other debugging technique, when I'm really stumped, is to pace up and down the office muttering to myself. Really! I get strange looks, but there's nothing like leg movement and a steady rhythm (careful, now!) for getting you thinking, and each time you turn around it helps you look at the problem slightly differently.

--
Ceterum censeo subscriptionem esse delendam.