Slashdot Mirror


Debugging Asynchronous Applications?

duncan bayne asks: "I'm attempting to debug a complicated telephony application, written in C#, that's almost entirely event driven. This is the first time I've debugged a large asynchronous application that isn't a GUI, and I'm curious to know what advice the Slashdot crowd has to share - have you any recommended tools, best practices, or common pitfalls to avoid?"

78 comments

  1. VS2005 by SIGALRM · · Score: 5, Informative
    This is the first time I've debugged a large asynchronous application that isn't a GUI, and I'm curious to know what advice the Slashdot crowd has to share
    I'm not sure if you are 2.0 yet, but if you are, start by taking a look at VS2005. In the debug department, enhancements include better JIT debugging, stepping into XML/Web services from a client, and state-driven object inspection. Object Test Bench (OTB) is a simple object-level test utility. You create instances of your objects, invoke methods, and evaluate results... to shorten the task of coding, debugging and re-coding. I'm not sure about telephony specifically, but WSE/WS* SOAP layers can be hard to manuever through in a debugger, yet VS2005 does it quite nicely via WSDL.

    One other suggestion... "event bus" apps like you describe are good candidates for capturing as much runtime data as possible, so make sure you adjust your build parameters and do as much of that as possible, especially in problem assemblies. Oh, and don't forget to build nUnits. Sounds like you're walking into some prewritten code, but the effort might be worthwhile.
    --
    Sigs cause cancer.
    1. Re:VS2005 by Retric · · Score: 1

      Step 1 log everything (timestamps, function name, data, what you think should be happening, ect.)

      First get interfaces setup properly so that you know the function is getting the right data.
      Then work on full functionality (disabling locking if need be. It only needs to work the first time it's ok if you have to reboot everything every time before it will work again.)
      Then work on timing / locking issues.(logging each lock is a good idea)
      Then work on cleaning up aka releasing resources.
      Then start disabling logging and work on speed / reliability issues. It's a good idea to leave a reasonable level of logging, but it's easy to log a lot of junk which make it harder to follow what's going on in the log. (It's not a bad idea to log every 600th time something happens if it tends to happen 10 times a second. You don't need to know everything just that tings are still working. And it's a good idea to limit the log files to a fixed size say 10MB.)

      FYI: If it sometimes works then you probably have a timing issue. It's not a bad idea to setup a test so that you can always send the exact same data and see if it works once. Once the first test always works after a clean boot you can focus on specific failures vs "well it works 95% of the time I wonder what's wrong..."

      PS: When it really does seem totally random try a new test machine it's easy to waste a lot of time trying to debug a system when the RAM in your test box is having problems.

  2. A Stab at Some Solutions & Strategies by eldavojohn · · Score: 3, Informative

    As a basic sort of testing phase, do it all on one computer. This eliminates all possible network errors that can occur. I'm assuming this is meant to be huge so maybe the bugs you speak of result from multiple machines fouling each other up. Either way, let's talk debugging strategies!

    Also, as I recall from my days of drudgery at college, create tons of output.

    So I will suggest as a preliminary requirement that you create a nice logging system (if you haven't done so already). I haven't written much C# so I'm going to be talking abstractly. Hopefully the rest of Slashdot can help with the specifics to C#. Now, what I mean is that you should create a class that just creates an output log file that you can read for output later. I don't mean to put a message for every packet sent but maybe it wouldn't hurt to put a message for each stream or connection opened. It's going to help for you to generate random IDs for each call and to put the destination/receiving IP:Port in your log. This would most likely be helpful with a server. It also will be helpful to store printlns in your code (redirect standard out to the logger).

    Now use this on every machine in the system. If one machine should start to give you problems, create a mutual exclusion on this log (or put all of the log entries in critical regions). In Java, you can use object locks or the synchronized keyword--in C# I'm pretty sure they have something similar. Just because it's not a GUI doesn't mean you can't record output.

    Just a friendly warning, time stamping is usually worthless unless you have a logical network (i.e. a Lamport Clock) clock scheme set up (which usually requires lots of time on one's hands). You could shoot for an NTP server but I wouldn't trust the accuracy past 500 ms. If you absolutely need a clock scheme, I recommend having one machine on the network tick tock an increasing number that is reflected in all the logs. Make the time between ticks adjustable--this way you'll be able to check out events roughly relevant to these ticks (assuming the time it takes to get there is similar).

    In the end, your best tool is your brain. Designing tests and double checking the logs on each machine to see that the linear time sequence of relative events is correct. Logic will be your only friend in this journey. Don't be afraid to kick off more threads on the client side if they don't need to share resources. If you have a server side, be careful in how many threads you have and make sure you realize what memory scope they're limited to.

    For the love of god, if you use ports--don't forget to free them when you're done using them!

    Unfortunately, Nornir is not OSS ... yet. Their papers may be of use to you, however. If you're having problems with packets on either end, use my good friend ethereal.

    Good luck! Happy debugging!

    --
    My work here is dung.
    1. Re:A Stab at Some Solutions & Strategies by Duhavid · · Score: 1

      If you put a mutex on log access, and dont accept the message
      async, you can end up serializing your app. I.E., dont
      hold up the caller on logging.

      Also, there are application blocks for logging in C#.
      Dont recall the name of it right off the top of my head.

      Another way to solve the timing issue, make it so that you have one
      app in your system to receive and record the messages.
      That one app should timestamp the message when it comes in,
      put it in a queue, and release the caller. Then you dont
      have the worry about figuring out what happened before and
      after what. Requires connectivity, but you have that, or
      you have a fairly easy to debug problem.

      Everything that logs should put a unique name on the message,
      something like App:Module:Function.

      Other than those, I think you have hit the high points.

      --
      emt 377 emt 4
    2. Re:A Stab at Some Solutions & Strategies by DJDutcher · · Score: 1

      A good logging library for .NET is log4net. It is basically the same API as log4j which is a popular logging library for Java.

    3. Re:A Stab at Some Solutions & Strategies by kelleher · · Score: 1
      You could shoot for an NTP server but I wouldn't trust the accuracy past 500 ms.
      What...?

      Ok, so you're either smarter than Dr. Mills or smoking crack - I'd bet the later. What crevice did you pull your 500ms threshhold from? Even a strat 10+ timesever should give you a relative accuracy better than that as long as it's in a room with a stable temperature.

    4. Re:A Stab at Some Solutions & Strategies by Violet+Null · · Score: 1

      For a long time, googling for "log4net documentation" would return "Log4Net is Crap4Crap" as the first hit; right now, that's hit #2, underneath the real log4net site. And, I have to say, since the documentation I found really sucked, I had to agree with it.
       
      (Yes, log4net and log4j are practically identical, and I can always use its documentation, but this was a problem with the EventLogAppender, which log4j doesn't have...)

    5. Re:A Stab at Some Solutions & Strategies by Detritus · · Score: 1

      That isn't unreasonable. I've seen alleged stratum 2/3 servers that were in error by 100 ms or more. Not sure how they broke it, but they did. Usually it's better than 50 ms, but I wouldn't bet my life on it.

      --
      Mea navis aericumbens anguillis abundat
    6. Re:A Stab at Some Solutions & Strategies by arivanov · · Score: 1

      If all of your machines are synchronising versus the same server and they are on the same LAN - who cares. Their relative time difference will be in the sub 10ms range.

      If you need to test network events never use a real network in the first instance. Do it with a simulated network like BSD DUMMYNET and configure NTP to pass through unmolested. This will allow you to introduce arbitrary delays, packet loss, jitter and bandwidth constraints while retaining nearly perfect synchronisation between systems.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
  3. Don't use printf by BadAnalogyGuy · · Score: 5, Informative

    The buffering nature of printf along with the asynchronous execution of code can lead to out of order debug printfs.

    I had this one project. It was to build a model car, not related to programming at all. I started out doing well, following the instructions and generally getting along fine. But then I lost patience with the tedium and left to get a beer to relax. When I finally got back to working on the model, I found that the dog had chewed it up and the wife had thrown it out (the trashed model, not the dog, but she'd love to throw out the dog too). I left it in the garage where I thought it would have been safe, but I guess you can't expect things to stay the same if you leave it sitting there for a year and a half.

    The moral of the story is that if I look to see where things went wrong, it was the point where I lost patience and decided to do something different than what I should have been focused on. This is like how many people try to put breakpoints all over their code rather than where they should put them. Don't debug willy-nilly and expect to make any good progress. But also don't try to throw in some seemingly helpful actions (like printf) because it may end up changing the whole state of the program.

    1. Re:Don't use printf by Anonymous Coward · · Score: 2, Informative

      setbuf(stdout,NULL) is your friend...

    2. Re:Don't use printf by Anonymous Coward · · Score: 0

      That story was a lot like the bible. Long, boring, and made up. =D

    3. Re:Don't use printf by SheeEttin · · Score: 0

      left to get a beer to relax [...] sitting there for a year and a half

      It takes you a year and a half to get a beer?
      I could down over a thousand beers in that time!

    4. Re:Don't use printf by Anonymous Coward · · Score: 0

      Kind of like your life?

    5. Re:Don't use printf by Anonymous Coward · · Score: 0

      I don't think it's possible to make up for his life.

    6. Re:Don't use printf by taniwha · · Score: 1
      printf's perfectly OK so long as you understand what's going on (and you better if you're doing something like this) - if you're doing real-time multithreaded stuff you have to be aware of the consequences of the hidden mutexes inside stdio (and malloc/new!) and what they can do (serialisation, priority inversion). If you do have issues with printf in a real-time system (more to do with the TIME it takes to send them to the console) you can build a silo do drop them into and print them later.

      However more to the point - you shouldn't be debugging a 'big asynchronous system' from scratch - build it in little bits and test them as you go

    7. Re:Don't use printf by Ravatar · · Score: 1

      So... how does a blurb about printf relate to .NET?

  4. Debugging Asynchronous Applications For Dummies by Anonymous Coward · · Score: 3, Funny

    Chapter 1.

    Stand up and slowly back away from the keyboard.

    Chapter 2.

    There is no chapter 2.

  5. Reference by adamy · · Score: 4, Informative

    A great reference for design in these types of applications is Enterprise Integration Patterns By Gregor Hohpe and Bobby Wolfe. Some of the paterns contained provide methods that you can use to debug and trace such appliactions. Typically, these apps deal with messages as the primary unit of transport. You have kind of a heisenberg uncertainty principle in effect, as you can tell where a given message is in the system, or what the flows of through a given point of the system, but it is hard to tell both. And the at of monitoring a system can often affect timing, and change behavior. One thing you can do with messages is put proxies in place that allow you to log every message delivered to a given componenet. Other tricks are to add routing slips to messages that indicate what componenets they have passed through, but this might not be possible to introduce into an existing system. Network tracking tools are invaluable, netstat and tcpdump are your friends. Or whatever tools work with your particular network stack.

    --
    Open Source Identity Management: FreeIPA.org
  6. Logfiles by starling · · Score: 4, Interesting

    I've found and fixed more bugs in large systems by analysing logfiles than by running tests in a debugger. Log everything, and make sure each line in the logfiles has an accurate time stamp.

    The trick is to learn how to correlate information between different logfiles to build up a picture of how all the components (process or thread) behave together. The classic Unix utilities like find, grep, awk, cut and less are your friends.

    1. Re:Logfiles by arivanov · · Score: 1

      Seconded.

      You do not debug a complex network application (or any other asynchronous application) via a debugger.

      You log it.

      Further on that, it is important to have selective logging. An example of good logging is recent sendmail whose logging can be selectively tuned and turned on and off for various parts of the application.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    2. Re:Logfiles by TapeCutter · · Score: 1

      Ditto, been doing it for 15yrs. Print __FILE__, __LINE__, timestamp, pid and threadID with every message (macro's will make it simpler), these are the "must have's" everything else is negotiable. It does help to have some consistency in the messages when using words like fatal-error, error, warning, ect, but that is hard to enforce in most commercial environments.

      Friends: Even though I find the MSVC search tool very usefull, one of the first things I do when setting up a development box at work is to download a windows version of grep.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    3. Re:Logfiles by tetrode · · Score: 1

      Indeed, absolutely.

      We have such a large (multiplatform) telephony application - and capable of are generating huge logfiles if necessary (think in multiply Gb per day if necessary, and this is already compressed).

      Some hints on that.

      Write a tracing component that every other component talks to. This tracing component will be responsable for:

        1. after receiving a certain number of lines, analysing whether the logfile is interesting enough to be written to disk (i.e. are there errors in there?)
        2. compressing the logfile
        3. writing them to disk
        4. go to 1, retaining a number of lines of course, because you need a context.

      Have a logfile per component.

      Have a master logfile where only major errors are logged in, that refer to the minor logfiles.

      This, my friend, you make configurable, and runtime changable (also the compressing). Don't use your own superduper compression technique, use zlib so unzip can be used.

      This way you can save much on valuable IO and processing power. Also be sure that there is ONE machine you get your timestamp from - and use GMT; this way you don't get problems when switching from summer/winter time.

      Use different debug levels and debug depths, so you can decide how much information you log and how deep you allow your logging to go.

      Write a log line when entering and exiting a function and structure it so that it is readable:

      10:18:27.283 > main(a,b,c)
      10:18:27.294 > HelloWorld(a,b)
      10:18:27.294 Written 7 bytes
      10:18:27.312 GoodByeWorld(c)
      10:18:27.401 c == NULL
      10:18:27.416 GoodByeWorld(c)
      10:18:27.472 main()

      This requires some effort at the start, but once you have this in your application, you will never want to give it up.

      Be ware that much logging can eat up much server resource. Our application logging can eat up to 60% of the resource, but is a godsend when debugging. I don't have the sources to the application but I can debug a lot and see whether it is our problem, the database problem, the pabx problem, someone elses problem, or our problem easily with the log files.

      Without them, I would be blind.

      If you need to contact me, be my guest: mark.tetrode gmail.com

      Mark

    4. Re:Logfiles by tetrode · · Score: 1

      The formatted log file should look of course better formatted, like this:

      10:18:27.283 > main(a,b,c)
      10:18:27.294    > HelloWorld(a,b)
      10:18:27.301       Written 7 bytes
      10:18:27.307    < HelloWorld returned 3
      10:18:27.312    > GoodByeWorld(c)
      10:18:27.401       c == NULL
      10:18:27.416    < GoodByeWorld returned 7
      10:18:27.472 < main() return

      When using multiple threads add also your thread name in there will help, of course.

      Mark

    5. Re:Logfiles by TheRaven64 · · Score: 1

      One thing I'd like to add is that it's useful to log messages being received as well as sent. This helps when multiple components are sending messages to the same one.

      --
      I am TheRaven on Soylent News
    6. Re:Logfiles by aCapitalist · · Score: 1

      I used to do a lot of low-level network programming and logfiles were the way to go. Debuggers are worthless for getting the whole picture of your system.

    7. Re:Logfiles by snopes · · Score: 1

      So, to take the above logic one step further: design a robust logging infrastructure. Just like any other part of the project, put together requirements and code accordingly. If you'll be doing high freq. logging/data acquisition consider implementing system resource monitoring into the logging infrastructure and triggering a failsafe (shutdown logging and *log* that you're shutting down logging) at certain resource utilization levels.

      Also, consider the likely need for time synch within your infrastructure. Depending on how granular your analysis will be consider the need for further offset based adjustment of the time stamps. In other words, sometimes even with a correct ntp (or IEEE 1588, is anyone using this yet?) setup you can be a few ticks off. You may be able to identify an event within the system which you know executes simultaneously across components. From that you can determine an offset which can be applied to all subsequent events.

      Anyhow, you may not need that. It's just something I've been dealing with recently for quite a different application from telcom.

      The fundamental advice is just a repeat: log well.

    8. Re:Logfiles by arnie_apesacrappin · · Score: 1
      Friends: Even though I find the MSVC search tool very usefull, one of the first things I do when setting up a development box at work is to download a windows version of grep.

      If you haven't seen this project, check out http://unxutils.sourceforge.net/. These tools end up on every windows machine I use. Make sure to grab the updates. Especially helpful on windows are the pclip and gclip utilities. Copy any text to the windows clipboard and the pclip utility will output the clipboard contents to STDOUT while glcip will take STDIN and put it into the clipboard. It's very useful.

      --

      Still, with a plan, you only get the best you can imagine. I'd always hoped for something better than that. -CP

    9. Re:Logfiles by starling · · Score: 1

      Uncanny - you just described almost exactly the system we used for a large process control setup. It seems odd that a lot of people use setups like these and we all have to roll our own.

      Does anyone know of any "plug-in" logging systems? This particular wheel has been reinvented enough times.

    10. Re:Logfiles by RadarMan · · Score: 1

      Once you've got your clocks sync'd up well, or you're running all processes on one machine, you can re-interleve the logs using sort (assuming your timestamp is sortable).

          sort *.log > fulllog.txt

      All my logs have microsecond precision, and they tend to grow at around 50 lines per second. Still, sort does fine on many 100+MB log files. This is INVALUABLE to solving strange timing bugs.

    11. Re:Logfiles by starling · · Score: 1

      Yep, that's the ticket. Can't believe I left out sort (and sed, for that matter) from my list of handy utils.

    12. Re:Logfiles by egeorge · · Score: 1

      For a good logging facility that works for a .Net environment, you can check out
      http://logging.apache.org/log4net/

  7. Mock Objects by kevin_conaway · · Score: 2, Insightful

    The question is kind of vague. Are you debugging receiving events? Sending events? Both? I'd look into Mock Object and see how they can help simulate asynchronous environments.

  8. The usual rules apply by jd · · Score: 2, Insightful
    Design from top down, test from bottom up. The lowest-level components won't care how they're called, only that they're called with the right parameters. That, alone, won't be enough but will give you a good starting point. Test harnesses can also be used, as they can simulate events.


    Look for invariants and ensure that they hold true where they are supposed to. That doesn't require fine analysis, but will detect problems in the logic when you're running the full system. Use profilers and coverage analysers to make sure that when you DO do invariant checks that you're actually checking all the areas they're supposed to hold up.


    Test "normal" values, borderline/extreme values (that's where overflows, underruns and other assorted nasties are likely to show up), and completely erronious data. Borderline can include borderline values, but since this is a network app, it can also include extreme volumes (very little or vast amounts of traffic, or even massive variations). Erronious data can be data that is invalid in and of itself (malformed packets, for example), or data that has no rational meaning (a non-existant codec, or a value that decodes to something absurd - I doubt many people will be doing 11.1 audio streams for VoIP, for example!)


    Beyond that, it's all much of a muchness. There's very little that is async specific to testing, so long as you concentrate on the logic rather than the means of getting there.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:The usual rules apply by toddbu · · Score: 1
      The lowest-level components won't care how they're called, only that they're called with the right parameters.

      Huh? My experience is that it's usually the low level components that are most sensitive. Create an accidental buffer overrun and it won't show up in any unit tests.

      --
      If you don't want crime to pay, let the government run it.
    2. Re:The usual rules apply by Duhavid · · Score: 1

      I believe the parent post meant that the bottom level
      components will not care if they are called from the
      real system or from a test harness. So, do the test
      harness, and use it to test the snot out of that
      sensitive component.

      --
      emt 377 emt 4
    3. Re:The usual rules apply by toddbu · · Score: 1

      Ok, maybe I'm missing something in your comment, but my exact point was that you can't catch a lot of errors with unit tests of individual components. You not only have to test multiple threads hitting the component, but also different numbers of threads and different timing for the calls. I'm not saying that unit tests can't help you stabilize the code, but there are many, many conditions that can only be found in the real environment. Since this is a telephony app, just the speed at which people press keys and the duration of the keypress could have a huge impact on the behavior of the individual components.

      --
      If you don't want crime to pay, let the government run it.
    4. Re:The usual rules apply by Duhavid · · Score: 1

      A: You can drive out a large number and class of bugs from the
              component before you begin integration testing. Then you will
              be focusing on "more real" bugs.

      B: There is nothing that says that the test harness cannot run
              multiple threads and test some "real world" type conditions.
              I know it will not drive out every integration or real world
              bug, but it will be a good start.

      Both of these will leave you with less to do when you do start
      your beta. And yes, you note well that there *will* be bugs
      found in the wild.

      I have done both of the above, and it has been an huge help
      in delivering software that is robust.

      --
      emt 377 emt 4
    5. Re:The usual rules apply by ifdef · · Score: 1

      I've found unit tests to be very useful. Only when you're sure that all the basic components function as they ought to function, is it worthwhile looking at interactions between the components.

      A unit test finds simple errors simply.

      Sure, as you move closer to using real data in the field, you will find more errors. But it's stupid to run the whole system in real time in order to trigger a bug where an "and" should have been an "or", then go through mountains of debugging data in order to localize it. Get rid of those things first, then do more complicated tests to find the harder-to-find things like race conditions.

    6. Re:The usual rules apply by toddbu · · Score: 1

      Agreed. I was thinking about this discussion more in terms of the original post, which was "how to I track down bugs in real time?" I'd hope that before they started that they'd have done some kind of sanity check on the code. :-)

      --
      If you don't want crime to pay, let the government run it.
  9. Hook into TAPI by scdeimos · · Score: 1

    If you can, hook into the TAPI (Telephony API) dll's using APIspy (if you're desparate for a GUI) or some other commandline tool (better for filtering). By examining the inputs and outputs through TAPI you'll probably get a better picture of what's going on than by trying to catch it all in the VS debugger.

  10. Message queues by toddbu · · Score: 1
    I worked for a company where we did a ton of this kind of stuff. It was probably even a little worse than what you're doing there as events could come from multiple sources both on and off the processor. We had a cool message system that wrote timestamped messages with each event encountered to a pipe, and a logger that took the results and wrote them to a file. No debuggers, just message logs. It's amazing how much you can learn this way. I remember one case where we were using a photocell to detect the arrival of a package at a station, and it turned out that there was a "bounce" in the signal coming from that photocell. The way we noticed the problem was that we were trying to process phantom packages by another system, and we were sure that the problem was with the other device until we dug through the logs and saw the duplicate messages.

    One other note - don't dismiss code review when trying to debug these kinds of systems. I seldom ever run a debugger any more because I just look at whatever output I have and make an educated guess as to where to look. A quick scan of the code often reveals the problem. I can usually fix a problem in the time that it takes others to set breakpoints.

    --
    If you don't want crime to pay, let the government run it.
  11. Firebug by chris_mahan · · Score: 0, Offtopic

    I use firebug, a firefox extension. It works great for in-browser message (send/reply and the js that made it).

    --

    "Piter, too, is dead."

    1. Re:Firebug by malraid · · Score: 1

      This is completly offtopic. He's not asking about AJAX, but rather GUI-less async apps.

      --
      please excuse my apathy
    2. Re:Firebug by ObsessiveMathsFreak · · Score: 1

      I use firebug, a firefox extension. It works great for in-browser message (send/reply and the js that made it).

      I award you no points, and may the FSM have mercy, on your immortal soul.

      --
      May the Maths Be with you!
  12. Unit Testing & Visual Studio .Net by Windjammer · · Score: 1

    If this project has them, I would also make sure to examine the output of unit tests http://www.nunit.org/ . If the project does not have these, then I would recommend checking it out. The one advantage to using the approach of unit testing, is that once you develop a test to produce the buggy behavior, it then becomes possible to test for this behavior at a later date (regression errors). In addition the debugger built into Visual Studio .Net is quite powerful once you get used to it (and with VS 2005 it has only gotten more powerful). Even if your project is not written for the 2.0 framework/2005 specifically, it still may be useful to debug it in 2005.

    One area that I would shy away from if you could help it is if you need to debug the interactions between two servers, is the remote debugging tools for debugging remote servers. I tried using these at work, and spent close to 5 hours on that particular Rat Hole.

    --
    What? Me worry? NEVER.....
    1. Re:Unit Testing & Visual Studio .Net by Windjammer · · Score: 1

      I also forgot to add...for logging & debugging purposes, I have found log4net http://logging.apache.org/log4net/ to be extremely useful. It is not the easiest logging library to get set up, but it does offer quite a bit of flexibility and is easy to use once you do.

      --
      What? Me worry? NEVER.....
  13. Simple by AKAImBatman · · Score: 1

    This is the first time I've debugged a large asynchronous application that isn't a GUI, and I'm curious to know what advice the Slashdot crowd has to share

    An event driven system is an event driven system is an event driven system. The only difference you're likely to find is that you can often see the results of a GUI application, but you can't "see" the results of a telephony application. Otherwise, debugging isn't really that different. If you need to get a feel for it, configure a logging system and place debugging logging events at various points in the code. The resulting log file can help give you a feel for when events happen, in what order, with what values, and several other important points.

    Also, if the original author(s) are available, never be afraid to ask questions. If you find anything in the code that wasn't obvious to you, add a comment. It will save you from trying to re-figure it out later. Have fun!

  14. Test cases!!! by malraid · · Score: 1

    Set up some synthetic test cases that replicate the problems you are having, solve them, then see if it solves the real world problem. It can be a bit tough. For example, I had to hack a simple web server to run inside my test case. You might have to do something similar to get to where you want to go.

    --
    please excuse my apathy
  15. Truss it. Oh, wait....

    --
    Comparing it to Windows will be a moot point, since El Dorado is going to have a 40% larger code base than XP.
  16. Build a simulator by BillAtHRST · · Score: 1

    Depending on how complicated the actual system you are interfacing to is, you may need to build a simulator. This may seem like a waste of time, but you can spend a ridiculous amount of time trying to (re-)capture certain events that happen rarely, or that require a lot of other things to happen first. I've done this several times, and it has always saved a lot of time in the long run. For one application (real-time trading), I built a small simulator for one of the exchange systems. It didn't do a lot -- just replayed canned quote data, and accepted the messages my app sent (logon, enter trade, etc.) and replied with something reasonable. If you're really ambitious, the simulator can be driven by some kind of script (text file, XML, etc.), so you can vary what it does for the different inputs, but this may be overkill. A simulator is also helpful for testing GUI apps, but most people don't bother -- they just play the part of a user by banging on the keyboard. (Or use a package like WinRunner, which, when you think about it, is really a scriptable simulator for a human user).

  17. Helpful .Net debugging blog by Anonymous Coward · · Score: 0

    This blog has a lot of information about debugging .Net using windbg. The focus is mainly ASP.Net applications but many issues, like memory usage and deadlock, would carry over to any server code.

  18. you call THAT asynchronous? by Khashishi · · Score: 1, Troll

    There's no such thing as asynchronous programming in C#, or microprocessors for that matter. Real asynchronous programming is done without a global clock and without nigh any flip-flops. You could implement such a program using VHDL or Verilog or some lower level language on an FPGA.

    As for debugging asynchronous logic? You'll probably have best luck with a divining rod. These things can have sensitivities to supply voltages, temperature, humidity, EMI, and other things.

    1. Re:you call THAT asynchronous? by Anonymous Coward · · Score: 0

      Stop posting this crap to Slashdot.

    2. Re:you call THAT asynchronous? by ponds · · Score: 1

      Congrats, this is by far the most obnoxious comment that I have ever read on this site, and that's quite a feat.

  19. examine code assumptions by Stranger+Than+Fictio · · Score: 2, Interesting

    I've done a bit of asynchronous debugging, principally troubleshooting interaction between a speech recognition system and Emacs. Most of the bugs I found tended to be due to errors in assumptions in the code. For example,

    (a) when a "change text" event handler in the editor is invoked, the editor will always be done reporting the result of the previous change.

    (b) event z will always be preceded by event w

    If you know the assumptions for each event handler, cases where they break down may become obvious. If not, you can add assertions to check those assumptions.

    Beyond that, strategies really depend on what sort of feedback loops you have. When you are debugging interaction between two objects or programs, try to simplify one so that you don't waste a lot of time trying to figure out which side is buggy. For example, I wrote a simple brain-dead editor which I was confident I understood and connected it to the speech recognition system. That way, when I found a problem, I was confident that I could quickly find any bugs on the editor side, so if I didn't find the bug there quickly, it had to be in the speech reco system.

  20. Event driven telephony by Anonymous Coward · · Score: 0

    If its event driven make sure you understand the program's state machine. If its telephony remember that the idiot at the other end might hang up on you at any time. Don't do anything to stop or block the telephony events from coming in. Expect unexpected events and deal with them or at least log what event you received. Check all return codes for failures and log the failures with as much detail as possible. Write your logs to be searchable.

  21. Load Test by plsuh · · Score: 5, Informative

    I have mod points, but I don't see anyone chiming in here about realistic load testing.

    For this kind of application, you must, *must*, MUST create a heavy load on a production system. I've done work with big, complex, multi-threaded web apps that have similar characteristics -- event-driven (when an HTTP request comes in) and server-only (no GUI). There are many bugs that don't show up until you put the system under load, as in dozens or hundreds of transactions per second. For instance, under light load a queue will never fill up, but under heavy load bizzarro, difficult-to-trace bugs will crop up that you can't reproduce on your development system. Even under the same load, your development system may run into a different constraint (e.g. CPU-bound so that it can't fill the queue fast enough and thus never hits the bug).

    To have any hope of catching these bugs, you need to instrument your application heavily, with logging calls that you can turn on and off easily with some sort of switch (kill signal, special dialing code, etc.). Running with a debugger attached will likely be next to impossible on your production or staging systems.

    Lastly, definitely invest in an automated test environment. You will need to do these kinds of debugging runs hundreds of times in the course of developing your app, and it just isn't feasible to have everyone in the company drop what they're doing and call into your app a dozen times a day. While there are plenty of load test tools for web apps, I'm not familiar with any for telephony apps, although some must exist. You may end up rolling your own from a bunch of old modems.

    Good luck, as the bugs in these systems are notoriously difficult to hunt down.

    --Paul

    1. Re:Load Test by Anonymous Coward · · Score: 0

      This is an important point. I think we all have been bitten by these situations. As a sysadmin I've had apps that were only tested on the developer's laptop take down servers and hubs (yes hubs, people used to use them before switches). Just because it works on your laptop, you aren't done yet. Set up another machine and write some code to hammer yours. I know, very painful. Basically a user or client emulator. Not fun.

  22. Not nearly enough info. by Inoshiro · · Score: 3, Informative

    Asycn application? Do you meant you have two parts of a client-server relationship, such as an SMTP client and server, or a more direct (and less formal) communication channel like shared memory?

    By event driven, do you mean that you have events requiring immediate attention, or do you have events you can buffer with water marks?

    Is it a direct producer-consumer with only 1-way communication, or do you need bi-directional communication?

    If you give some one a poorly worded problem, they cannot be expected to solve it.

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  23. Recommendations by CommandLineGuy · · Score: 4, Interesting

    For what it's worth:

    1 - be cautious about testing debug mode - there's an awful lot the compiler tosses in to enable debugging which may impact how the code actually executes.
    2 - use logging extensively. I'd recommend using log4net or something like that.
    3 - use an integration model for your unit testing. Start with the smallest unit tests and build upwards. This will allow you to gradually build "correct" code and focus on the messages/events between components.
    4 - build a simulator (someone mentioned that before), they are truly invaluable. Keep it as simple as possible.
    5 - check, double check, and triple check variable access. It's easy to run into a race condition between reads and writes. Study and understand lock(...), reader-writer-locks, semaphores, and mutexes.
    6 - when testing, don't forget to test expected conditions, unexpected conditions, boundary conditions (null objects, empty strings, negative values, 0's, positive values, and overflows), errors (like zombie conditions where a response is _never_ generated, dropped connections, garbage results), etc.
    7 - learn Debug.Assert and check your pre- and post-conditions
    8 - if you use strings, make sure you understand how strings and stringbuilder operate - they can have dramatic differences in efficiency/memory utilization/and GC.
    9 - events can be static, and don't forget to encapsulate your event accessors (they look like properties, but instead of get/set, they're add/remove)
    10 - if you plan on using compiler optimization switches, use them last during testing - after you can prove the app works correctly. Optimization switches can dramatically reorder things which is definitely not good if you're trying to determine correctness.
    11 - set the compiler to give the maximum warning level. Your app should generate no warnings or errors while compiling.
    12 - walk through the code with someone to double check your logic and field access. If you can convey it to someone else either through comments, design, etc., and can justify all field accesses as well as access control, you'll be in good shape. Yes, this is peer review. It's even useful to haul in a project manager that knows nothing about coding and nods like a chicken. Listen to yourself as you talk and if you stumble or have a hard time explaining something, that's a hint that a redesign might be in order.
    13 - you might want to put in some profiling counters so you can capture metrics on it. This way, as you change the code over time, you can almost quantifiably determine if the code is truly improving with respect to throughput, responses, etc. or not.

    That's all I can think of off the top of my head.

    Good luck, it's a fun journey.

    --
    [Of course it's client-server; it runs on a LAN]
  24. Log everything by rtos · · Score: 2, Informative
    I'm no programming expert, but I've found that logging everything with accurate timestamps can solve a lot of problems. One of the best things I've done was to acquaint myself with Python's logging module. It's really a lot nicer than throwing print statements all over the place, and log levels make for easy switching in and out of "debug" mode. So that's my advice... implement a good logging system. :)

    I'm unfortunately not too familiar with C#, so I can't comment on it's logging facilities (or lack thereof) other than the .NET EventLog class.

    There is a project on Sourceforge called C# Logger that is supposedly similar to log4j in Java. But it seems to be stuck in alpha release mode, and not particularly active.

    Just my two cents. Hopefully it helps. :)

    --
    -- null
  25. create a logger. by Spy+der+Mann · · Score: 1

    A logger object which keeps the messages in memory and flushes when the buffer gets full. By keeping the messages in memory, you'll be able to log in realtime.

    The trick is not having two threads running at the same time. Or create one logger per thread. (Just be careful with avoiding deadlocks and all that multithreaded stuff)

    If your app is multitier, have a logger for each tier. Altho it would be trickier.

  26. Robust Logging by RingDev · · Score: 1

    A robust Logging solution is a requirement. Something that can not only tell you all the usual goodies, but also, which thread is performing the logging.

    I've you're still peering through a text file trying to figure out a stream of events, stop. Set up a database and a logging system that will track entries by session and process. It will save you a huge amount of time in the long run.

    -Rick

    --
    "Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
  27. Proper design helps a lot by ebbe11 · · Score: 1
    Event driven probably also means multithreaded, right? If so, here are a few pieces of advice for what to do when designing your program:
    • Protect shared data. Make sure that data shared between threads is protected such that only one thread at a time has access.
    • ... and make the time this access occurs as short as possible.
    • Protect shared resources.
    • ... but know thy deadlocks. Be absolutely sure that two threads do not get in a state where each thread waits for a resource that the other thread holds.
    • Use state machines.
    --

    My opinion? See above.
  28. Wow... Short answer, "Don't ask Slashdot". by pla · · Score: 3, Interesting

    50 comments, and not one good answer (though I saw three posts of good advice vaguely applicable to your needs).

    First of all - Debugging takes hard work. Sorry folks, no matter how easy Microsoft tries to make it, no matter how tightly they integrate Java-killer-P into app-Q, you still need the ability to follow the flow of bits from point A to Z, and more importantly, figure out what B through Y need to do.


    How to debug asynchronous events... Since you mentioned c#, I will presume you have a REALLY coarse granularity here when you say "async".

    So... First step, force non-reentrancy and non-overlapped event handling. Does your problem go away? Find the global data you clobbered.

    Step 2 (if #1 fails) - Run both ends of your app on the same machine. Does the problem go away? Don't trust .NET to gracefully handle network errors. Don't trust your process to have basically-uninterrupted control of the CPU. Don't trust try/catch to save you from "real" problems.

    Step 3 - Okay, you have a "real" bug, in your code. But on the bright side, if you got here, you can probably reproduce it, so, piece of cake. Load up your trusty debugger and dump a COMPLETE stack trace up to the error. Don't trust the last line to have caused the error, it just failed to deal with whatever broken crap the actual problem threw its way.

    Step 4 - trace through your code, on both sides, one line at a time. Sound tedious? Yup. You might spend a week on a single run. But you'll sure as hell know the flow of your by the time it finishes.

    Step 5 - No step 5 exists. Step 4 WILL let you find your problem, as long as it resides in your code and not in the aether between the two sides of the connection (which steps 1 and 2 should have eliminated as a problem).

    1. Re:Wow... Short answer, "Don't ask Slashdot". by Anonymous Coward · · Score: 0

      That's got to be one of the most succinct answers to the problem of debugging an asynchronous application I've ever seen.


      I used to do embedded programming in C. My embedded systems work involved safety critical software for active forcefeedback flight control systems and we had to map out every conceivable state, state transition, and inputs and outputs that our systems could go through along with all of the resources being used at each timestep. THEN we'd proceed to write code.

      Even then we'd have the occasional bug.

  29. Messages will get you by Grab · · Score: 1

    Race conditions and deadlocks are going to be your enemies. Especially deadlocks.

    A good starting point is an old-school data flow diagram. Draw each of your processes as a bubble, and show links between them. If you have one signal going from bubble A to bubble B, and another signal going from bubble B back to bubble A, you have a possibility for deadlocks or races. A data flow diagram will give you a good insight into which signals need checking.

    Grab.

    1. Re:Messages will get you by Joey+Vegetables · · Score: 3, Funny

      . . . deadlocks are going to be your enemies. Especially deadlocks.

      An easily vanquished enemy. Just log into Visual Haircut 2005 at least once a month or two, and don't forget to run the Help Me Stop Smoking Ganjj wizard as needed.

    2. Re:Messages will get you by Grab · · Score: 1

      Dude, it took me a moment to work out what you were on about. Then it took me a minute to wipe the drink off my keyboard...

  30. Re:VS2005 (BlueJ-based "innovation") by joel.neely · · Score: 1

    Ah, yes, the wonderful "innovation" of the object bench! Of course, that's "innovation" in the classic sense of "take somebody else's idea and resell it without credit or payment". See http://www.bluej.org/vs/vs-bj.html for the real story.

  31. whats the bug? by Anonymous Coward · · Score: 0

    how you debug the application will probably depend on what the bug is.

    If its a problem that you generate a "Release", instead of an "Address Complete", then its something wrong with your ISUP state machine. If its a problem with multiple calls from the same account causing cahs problems, then its a timing or threading issue. If its that the server can't handle so many calls, its a performance or starvation issue. If its that you can't outsieze a line when there are many free, you may be processing messages differently to your switch.

    You need to use one of the many tools available to solve the particular problem, which may vary from protocol analysers, to thread debuggers, to memory profilers, to fixture data...

  32. Software Engineering 101, Week 4 by LifesABeach · · Score: 1

    Write to a text file for each thread the inputs, and results that were recieved. Use a 'Try/Catch' approach. And it any of these solutions appear to be not understandable; Burn you Software Engineering degree, and remove that line off of your resume that you will soon be resubmitting.

  33. dude, this is ./ by Anonymous Coward · · Score: 0

    don't come with M$ questions.

  34. log4net by Petronius · · Score: 1

    http://logging.apache.org/log4net/ is your friend. it's ported from log4j. it works really well. the documentation is not 100% accurate, so Google is your friend too. we log everything to a single Oracle table, it's nice and easy to mine using SQL statements, it's easy to purge; logging medium is mostly a matter of personal preference. a good logging framework will let you add new loggers / change log levels without touching your app (without even restarting it). BTW, Microsoft's Enterprise Library Logging Block appears to be mostly crap...(http://weblogs.asp.net/lorenh/archive/2005 /02/18/376191.aspx).

    --
    there's no place like ~
  35. the .net asynchronous patten by Anonymous Coward · · Score: 0

    .net's asynchronous design pattern is very easy to use, given you know how to use it. it can be confusing at first, but once you get more experience, it just flows.

    everything is BeginInvoke or EndInvoke. BeginInvoke will start whatever operation, and you can optionally use EndInvoke the line after to block the running thread. in the BeginInvoke method you pass in a delegate, which occurs when the event happens.

    sorry, bad at explaining...anyways, the Socket class has a BeginReceive. so essentially, if you have socket.BeginReceive(some parameters, new AsyncCallback(ReceiveCallback), socket);, the socket will listen for any data, and when it receives it, the ReceiveCallback method will be executed (and in there you use EndReceive to unblock that operation)

    make sure you using the state object (which is AsyncState in the IAsyncResult interface), rather than instance variables in the class. it helps prevent locking issues.

    as for debugging, you can kinda cheat and put a break point at BeginInvoke. once the program breaks, then you add the breakpoint at EndInvoke and continue.

  36. Suggested format for async logging by Ajay+Soni · · Score: 1

    We develop complex telephony based systems that use AJAX, Web Services and CTI frameworks for environments of upto 650 users (multi site, multi web server and so). The only best way to debug asynchronous information is via logging (using log4cxx and log4net) as suggested by the others.

    Because you have multiple threads and sub systems we find this log format to be the best for our environment:

    Issue command example:

    2006-02-07 12:02:58,385 DEBUG [7852] (ts.cs:1231) im_lib_aic613 - (5739:agent2200:1426:43e88c6) >        : TS.AnswerVDU (vduid:=43e88c6d)

    Asynchronous event caused by command example:

    2006-02-07 12:02:58,619 DEBUG [5160] (ts.cs:  40) im_lib_aic613 - (5739:agent2200:1426:43e88c6)         E: TS.Connect({{'vdu_id','43e88c6d'},{'call_ref_id',' 437'}})

    Result of command example:

    2006-02-07 12:02:58,619 DEBUG [7852] (ts.cs:1235) im_lib_aic613 - (5739:agent2200:14264:43e88c6)        <: TS.AnswerVDU

    The format is as follows:

    @date time message-type [thread-id] (source-file:line-no) module-name - (backend-session-id:agent-id:our-session-id:call-i d) direction: method-or-event-name(parameters)

    message-type = {DEBUG | WARN | ERROR}
    direction = { > make a method call | < reply from call | E is an event from backend }

    The only issue we have is that the line lengths can be long, but you should see everything nicely formated in a nice text editor such as TextPad, which autorefreshes when the log file changes.

    We also put in the same log file the AJAX client logs too.

    Hope this helps! - Ajay