Debugging Asynchronous Applications?

← Back to Stories (view on slashdot.org)

Debugging Asynchronous Applications?

Posted by Cliff on Wednesday February 8, 2006 @01:20PM from the mind-your-breakpoints dept.

duncan bayne asks: "I'm attempting to debug a complicated telephony application, written in C#, that's almost entirely event driven. This is the first time I've debugged a large asynchronous application that isn't a GUI, and I'm curious to know what advice the Slashdot crowd has to share - have you any recommended tools, best practices, or common pitfalls to avoid?"

4 of 78 comments (clear)

Min score:

Reason:

Sort:

Logfiles by starling · 2006-02-08 13:30 · Score: 4, Interesting

I've found and fixed more bugs in large systems by analysing logfiles than by running tests in a debugger. Log everything, and make sure each line in the logfiles has an accurate time stamp.

The trick is to learn how to correlate information between different logfiles to build up a picture of how all the components (process or thread) behave together. The classic Unix utilities like find, grep, awk, cut and less are your friends.
examine code assumptions by Stranger+Than+Fictio · 2006-02-08 15:09 · Score: 2, Interesting

I've done a bit of asynchronous debugging, principally troubleshooting interaction between a speech recognition system and Emacs. Most of the bugs I found tended to be due to errors in assumptions in the code. For example,

(a) when a "change text" event handler in the editor is invoked, the editor will always be done reporting the result of the previous change.

(b) event z will always be preceded by event w

If you know the assumptions for each event handler, cases where they break down may become obvious. If not, you can add assertions to check those assumptions.

Beyond that, strategies really depend on what sort of feedback loops you have. When you are debugging interaction between two objects or programs, try to simplify one so that you don't waste a lot of time trying to figure out which side is buggy. For example, I wrote a simple brain-dead editor which I was confident I understood and connected it to the speech recognition system. That way, when I found a problem, I was confident that I could quickly find any bugs on the editor side, so if I didn't find the bug there quickly, it had to be in the speech reco system.
Recommendations by CommandLineGuy · 2006-02-08 15:38 · Score: 4, Interesting

For what it's worth:

1 - be cautious about testing debug mode - there's an awful lot the compiler tosses in to enable debugging which may impact how the code actually executes.
2 - use logging extensively. I'd recommend using log4net or something like that.
3 - use an integration model for your unit testing. Start with the smallest unit tests and build upwards. This will allow you to gradually build "correct" code and focus on the messages/events between components.
4 - build a simulator (someone mentioned that before), they are truly invaluable. Keep it as simple as possible.
5 - check, double check, and triple check variable access. It's easy to run into a race condition between reads and writes. Study and understand lock(...), reader-writer-locks, semaphores, and mutexes.
6 - when testing, don't forget to test expected conditions, unexpected conditions, boundary conditions (null objects, empty strings, negative values, 0's, positive values, and overflows), errors (like zombie conditions where a response is _never_ generated, dropped connections, garbage results), etc.
7 - learn Debug.Assert and check your pre- and post-conditions
8 - if you use strings, make sure you understand how strings and stringbuilder operate - they can have dramatic differences in efficiency/memory utilization/and GC.
9 - events can be static, and don't forget to encapsulate your event accessors (they look like properties, but instead of get/set, they're add/remove)
10 - if you plan on using compiler optimization switches, use them last during testing - after you can prove the app works correctly. Optimization switches can dramatically reorder things which is definitely not good if you're trying to determine correctness.
11 - set the compiler to give the maximum warning level. Your app should generate no warnings or errors while compiling.
12 - walk through the code with someone to double check your logic and field access. If you can convey it to someone else either through comments, design, etc., and can justify all field accesses as well as access control, you'll be in good shape. Yes, this is peer review. It's even useful to haul in a project manager that knows nothing about coding and nods like a chicken. Listen to yourself as you talk and if you stumble or have a hard time explaining something, that's a hint that a redesign might be in order.
13 - you might want to put in some profiling counters so you can capture metrics on it. This way, as you change the code over time, you can almost quantifiably determine if the code is truly improving with respect to throughput, responses, etc. or not.

That's all I can think of off the top of my head.

Good luck, it's a fun journey.

--
[Of course it's client-server; it runs on a LAN]
Wow... Short answer, "Don't ask Slashdot". by pla · 2006-02-08 22:32 · Score: 3, Interesting

50 comments, and not one good answer (though I saw three posts of good advice vaguely applicable to your needs).

First of all - Debugging takes hard work. Sorry folks, no matter how easy Microsoft tries to make it, no matter how tightly they integrate Java-killer-P into app-Q, you still need the ability to follow the flow of bits from point A to Z, and more importantly, figure out what B through Y need to do.

How to debug asynchronous events... Since you mentioned c#, I will presume you have a REALLY coarse granularity here when you say "async".

So... First step, force non-reentrancy and non-overlapped event handling. Does your problem go away? Find the global data you clobbered.

Step 2 (if #1 fails) - Run both ends of your app on the same machine. Does the problem go away? Don't trust .NET to gracefully handle network errors. Don't trust your process to have basically-uninterrupted control of the CPU. Don't trust try/catch to save you from "real" problems.

Step 3 - Okay, you have a "real" bug, in your code. But on the bright side, if you got here, you can probably reproduce it, so, piece of cake. Load up your trusty debugger and dump a COMPLETE stack trace up to the error. Don't trust the last line to have caused the error, it just failed to deal with whatever broken crap the actual problem threw its way.

Step 4 - trace through your code, on both sides, one line at a time. Sound tedious? Yup. You might spend a week on a single run. But you'll sure as hell know the flow of your by the time it finishes.

Step 5 - No step 5 exists. Step 4 WILL let you find your problem, as long as it resides in your code and not in the aether between the two sides of the connection (which steps 1 and 2 should have eliminated as a problem).