Debugging Asynchronous Applications?
duncan bayne asks: "I'm attempting to debug a complicated telephony application, written in C#, that's almost entirely event driven. This is the first time I've debugged a large asynchronous application that isn't a GUI, and I'm curious to know what advice the Slashdot crowd has to share - have you any recommended tools, best practices, or common pitfalls to avoid?"
One other suggestion... "event bus" apps like you describe are good candidates for capturing as much runtime data as possible, so make sure you adjust your build parameters and do as much of that as possible, especially in problem assemblies. Oh, and don't forget to build nUnits. Sounds like you're walking into some prewritten code, but the effort might be worthwhile.
Sigs cause cancer.
As a basic sort of testing phase, do it all on one computer. This eliminates all possible network errors that can occur. I'm assuming this is meant to be huge so maybe the bugs you speak of result from multiple machines fouling each other up. Either way, let's talk debugging strategies!
... yet. Their papers may be of use to you, however. If you're having problems with packets on either end, use my good friend ethereal.
Also, as I recall from my days of drudgery at college, create tons of output.
So I will suggest as a preliminary requirement that you create a nice logging system (if you haven't done so already). I haven't written much C# so I'm going to be talking abstractly. Hopefully the rest of Slashdot can help with the specifics to C#. Now, what I mean is that you should create a class that just creates an output log file that you can read for output later. I don't mean to put a message for every packet sent but maybe it wouldn't hurt to put a message for each stream or connection opened. It's going to help for you to generate random IDs for each call and to put the destination/receiving IP:Port in your log. This would most likely be helpful with a server. It also will be helpful to store printlns in your code (redirect standard out to the logger).
Now use this on every machine in the system. If one machine should start to give you problems, create a mutual exclusion on this log (or put all of the log entries in critical regions). In Java, you can use object locks or the synchronized keyword--in C# I'm pretty sure they have something similar. Just because it's not a GUI doesn't mean you can't record output.
Just a friendly warning, time stamping is usually worthless unless you have a logical network (i.e. a Lamport Clock) clock scheme set up (which usually requires lots of time on one's hands). You could shoot for an NTP server but I wouldn't trust the accuracy past 500 ms. If you absolutely need a clock scheme, I recommend having one machine on the network tick tock an increasing number that is reflected in all the logs. Make the time between ticks adjustable--this way you'll be able to check out events roughly relevant to these ticks (assuming the time it takes to get there is similar).
In the end, your best tool is your brain. Designing tests and double checking the logs on each machine to see that the linear time sequence of relative events is correct. Logic will be your only friend in this journey. Don't be afraid to kick off more threads on the client side if they don't need to share resources. If you have a server side, be careful in how many threads you have and make sure you realize what memory scope they're limited to.
For the love of god, if you use ports--don't forget to free them when you're done using them!
Unfortunately, Nornir is not OSS
Good luck! Happy debugging!
My work here is dung.
The buffering nature of printf along with the asynchronous execution of code can lead to out of order debug printfs.
I had this one project. It was to build a model car, not related to programming at all. I started out doing well, following the instructions and generally getting along fine. But then I lost patience with the tedium and left to get a beer to relax. When I finally got back to working on the model, I found that the dog had chewed it up and the wife had thrown it out (the trashed model, not the dog, but she'd love to throw out the dog too). I left it in the garage where I thought it would have been safe, but I guess you can't expect things to stay the same if you leave it sitting there for a year and a half.
The moral of the story is that if I look to see where things went wrong, it was the point where I lost patience and decided to do something different than what I should have been focused on. This is like how many people try to put breakpoints all over their code rather than where they should put them. Don't debug willy-nilly and expect to make any good progress. But also don't try to throw in some seemingly helpful actions (like printf) because it may end up changing the whole state of the program.
A great reference for design in these types of applications is Enterprise Integration Patterns By Gregor Hohpe and Bobby Wolfe. Some of the paterns contained provide methods that you can use to debug and trace such appliactions. Typically, these apps deal with messages as the primary unit of transport. You have kind of a heisenberg uncertainty principle in effect, as you can tell where a given message is in the system, or what the flows of through a given point of the system, but it is hard to tell both. And the at of monitoring a system can often affect timing, and change behavior. One thing you can do with messages is put proxies in place that allow you to log every message delivered to a given componenet. Other tricks are to add routing slips to messages that indicate what componenets they have passed through, but this might not be possible to introduce into an existing system. Network tracking tools are invaluable, netstat and tcpdump are your friends. Or whatever tools work with your particular network stack.
Open Source Identity Management: FreeIPA.org
I have mod points, but I don't see anyone chiming in here about realistic load testing.
For this kind of application, you must, *must*, MUST create a heavy load on a production system. I've done work with big, complex, multi-threaded web apps that have similar characteristics -- event-driven (when an HTTP request comes in) and server-only (no GUI). There are many bugs that don't show up until you put the system under load, as in dozens or hundreds of transactions per second. For instance, under light load a queue will never fill up, but under heavy load bizzarro, difficult-to-trace bugs will crop up that you can't reproduce on your development system. Even under the same load, your development system may run into a different constraint (e.g. CPU-bound so that it can't fill the queue fast enough and thus never hits the bug).
To have any hope of catching these bugs, you need to instrument your application heavily, with logging calls that you can turn on and off easily with some sort of switch (kill signal, special dialing code, etc.). Running with a debugger attached will likely be next to impossible on your production or staging systems.
Lastly, definitely invest in an automated test environment. You will need to do these kinds of debugging runs hundreds of times in the course of developing your app, and it just isn't feasible to have everyone in the company drop what they're doing and call into your app a dozen times a day. While there are plenty of load test tools for web apps, I'm not familiar with any for telephony apps, although some must exist. You may end up rolling your own from a bunch of old modems.
Good luck, as the bugs in these systems are notoriously difficult to hunt down.
--Paul
Asycn application? Do you meant you have two parts of a client-server relationship, such as an SMTP client and server, or a more direct (and less formal) communication channel like shared memory?
By event driven, do you mean that you have events requiring immediate attention, or do you have events you can buffer with water marks?
Is it a direct producer-consumer with only 1-way communication, or do you need bi-directional communication?
If you give some one a poorly worded problem, they cannot be expected to solve it.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
I'm unfortunately not too familiar with C#, so I can't comment on it's logging facilities (or lack thereof) other than the .NET EventLog class.
There is a project on Sourceforge called C# Logger that is supposedly similar to log4j in Java. But it seems to be stuck in alpha release mode, and not particularly active.
Just my two cents. Hopefully it helps. :)
-- null