Debugging Asynchronous Applications?

← Back to Stories (view on slashdot.org)

Debugging Asynchronous Applications?

Posted by Cliff on Wednesday February 8, 2006 @01:20PM from the mind-your-breakpoints dept.

duncan bayne asks: "I'm attempting to debug a complicated telephony application, written in C#, that's almost entirely event driven. This is the first time I've debugged a large asynchronous application that isn't a GUI, and I'm curious to know what advice the Slashdot crowd has to share - have you any recommended tools, best practices, or common pitfalls to avoid?"

6 of 78 comments (clear)

Min score:

Reason:

Sort:

VS2005 by SIGALRM · 2006-02-08 13:21 · Score: 5, Informative

This is the first time I've debugged a large asynchronous application that isn't a GUI, and I'm curious to know what advice the Slashdot crowd has to share
I'm not sure if you are 2.0 yet, but if you are, start by taking a look at VS2005. In the debug department, enhancements include better JIT debugging, stepping into XML/Web services from a client, and state-driven object inspection. Object Test Bench (OTB) is a simple object-level test utility. You create instances of your objects, invoke methods, and evaluate results... to shorten the task of coding, debugging and re-coding. I'm not sure about telephony specifically, but WSE/WS* SOAP layers can be hard to manuever through in a debugger, yet VS2005 does it quite nicely via WSDL.

One other suggestion... "event bus" apps like you describe are good candidates for capturing as much runtime data as possible, so make sure you adjust your build parameters and do as much of that as possible, especially in problem assemblies. Oh, and don't forget to build nUnits. Sounds like you're walking into some prewritten code, but the effort might be worthwhile.

--
Sigs cause cancer.
Don't use printf by BadAnalogyGuy · 2006-02-08 13:29 · Score: 5, Informative

The buffering nature of printf along with the asynchronous execution of code can lead to out of order debug printfs.

I had this one project. It was to build a model car, not related to programming at all. I started out doing well, following the instructions and generally getting along fine. But then I lost patience with the tedium and left to get a beer to relax. When I finally got back to working on the model, I found that the dog had chewed it up and the wife had thrown it out (the trashed model, not the dog, but she'd love to throw out the dog too). I left it in the garage where I thought it would have been safe, but I guess you can't expect things to stay the same if you leave it sitting there for a year and a half.

The moral of the story is that if I look to see where things went wrong, it was the point where I lost patience and decided to do something different than what I should have been focused on. This is like how many people try to put breakpoints all over their code rather than where they should put them. Don't debug willy-nilly and expect to make any good progress. But also don't try to throw in some seemingly helpful actions (like printf) because it may end up changing the whole state of the program.
Reference by adamy · 2006-02-08 13:30 · Score: 4, Informative

A great reference for design in these types of applications is Enterprise Integration Patterns By Gregor Hohpe and Bobby Wolfe. Some of the paterns contained provide methods that you can use to debug and trace such appliactions. Typically, these apps deal with messages as the primary unit of transport. You have kind of a heisenberg uncertainty principle in effect, as you can tell where a given message is in the system, or what the flows of through a given point of the system, but it is hard to tell both. And the at of monitoring a system can often affect timing, and change behavior. One thing you can do with messages is put proxies in place that allow you to log every message delivered to a given componenet. Other tricks are to add routing slips to messages that indicate what componenets they have passed through, but this might not be possible to introduce into an existing system. Network tracking tools are invaluable, netstat and tcpdump are your friends. Or whatever tools work with your particular network stack.

--
Open Source Identity Management: FreeIPA.org
Logfiles by starling · 2006-02-08 13:30 · Score: 4, Interesting

I've found and fixed more bugs in large systems by analysing logfiles than by running tests in a debugger. Log everything, and make sure each line in the logfiles has an accurate time stamp.

The trick is to learn how to correlate information between different logfiles to build up a picture of how all the components (process or thread) behave together. The classic Unix utilities like find, grep, awk, cut and less are your friends.
Load Test by plsuh · 2006-02-08 15:25 · Score: 5, Informative

I have mod points, but I don't see anyone chiming in here about realistic load testing.

For this kind of application, you must, *must*, MUST create a heavy load on a production system. I've done work with big, complex, multi-threaded web apps that have similar characteristics -- event-driven (when an HTTP request comes in) and server-only (no GUI). There are many bugs that don't show up until you put the system under load, as in dozens or hundreds of transactions per second. For instance, under light load a queue will never fill up, but under heavy load bizzarro, difficult-to-trace bugs will crop up that you can't reproduce on your development system. Even under the same load, your development system may run into a different constraint (e.g. CPU-bound so that it can't fill the queue fast enough and thus never hits the bug).

To have any hope of catching these bugs, you need to instrument your application heavily, with logging calls that you can turn on and off easily with some sort of switch (kill signal, special dialing code, etc.). Running with a debugger attached will likely be next to impossible on your production or staging systems.

Lastly, definitely invest in an automated test environment. You will need to do these kinds of debugging runs hundreds of times in the course of developing your app, and it just isn't feasible to have everyone in the company drop what they're doing and call into your app a dozen times a day. While there are plenty of load test tools for web apps, I'm not familiar with any for telephony apps, although some must exist. You may end up rolling your own from a bunch of old modems.

Good luck, as the bugs in these systems are notoriously difficult to hunt down.

--Paul
Recommendations by CommandLineGuy · 2006-02-08 15:38 · Score: 4, Interesting

For what it's worth:

1 - be cautious about testing debug mode - there's an awful lot the compiler tosses in to enable debugging which may impact how the code actually executes.
2 - use logging extensively. I'd recommend using log4net or something like that.
3 - use an integration model for your unit testing. Start with the smallest unit tests and build upwards. This will allow you to gradually build "correct" code and focus on the messages/events between components.
4 - build a simulator (someone mentioned that before), they are truly invaluable. Keep it as simple as possible.
5 - check, double check, and triple check variable access. It's easy to run into a race condition between reads and writes. Study and understand lock(...), reader-writer-locks, semaphores, and mutexes.
6 - when testing, don't forget to test expected conditions, unexpected conditions, boundary conditions (null objects, empty strings, negative values, 0's, positive values, and overflows), errors (like zombie conditions where a response is _never_ generated, dropped connections, garbage results), etc.
7 - learn Debug.Assert and check your pre- and post-conditions
8 - if you use strings, make sure you understand how strings and stringbuilder operate - they can have dramatic differences in efficiency/memory utilization/and GC.
9 - events can be static, and don't forget to encapsulate your event accessors (they look like properties, but instead of get/set, they're add/remove)
10 - if you plan on using compiler optimization switches, use them last during testing - after you can prove the app works correctly. Optimization switches can dramatically reorder things which is definitely not good if you're trying to determine correctness.
11 - set the compiler to give the maximum warning level. Your app should generate no warnings or errors while compiling.
12 - walk through the code with someone to double check your logic and field access. If you can convey it to someone else either through comments, design, etc., and can justify all field accesses as well as access control, you'll be in good shape. Yes, this is peer review. It's even useful to haul in a project manager that knows nothing about coding and nods like a chicken. Listen to yourself as you talk and if you stumble or have a hard time explaining something, that's a hint that a redesign might be in order.
13 - you might want to put in some profiling counters so you can capture metrics on it. This way, as you change the code over time, you can almost quantifiably determine if the code is truly improving with respect to throughput, responses, etc. or not.

That's all I can think of off the top of my head.

Good luck, it's a fun journey.

--
[Of course it's client-server; it runs on a LAN]