Software Logging Schemes?
MySkippy writes "I've been a software engineer for just over 10 years, and I've seen a lot of different styles of logging in the applications I've worked on. Some were extremely verbose — about 1 logging line for every 2 lines of code. Others were very lacking, with maybe 1 line in 200 devoted to logging. I personally find that writing debug and informational messages about every 2 to 5 lines works well for debugging an issue, but can become cumbersome when reading through a log for analysis. I like to write warning messages when thresholds or limits are being approached — these tend to be infrequent. I log errors whenever I catch one (but I've never put a 'fatal' message in my code, because if it's truly a fatal error I probably didn't catch it). Recently I came across log4j and log4net and have begun using them both. That brings me to my question: how do the coders on Slashdot handle logging in their code?"
Log the starting conditions so you can reconstruct data. Otherwise don't do much logging because it will hurt application performance, sometimes drastically.
I use Jakarta Commons Logging...
As usual, "it depends" on what you are trying to achieve. Nobody can give you a blanket recommendation. But I guess in general: the log files need to give you enough information, that can't be got in other ways, to solve any problem that comes up.
We have a realtime product that goes all over the world and talks to hardware that we can't always get access to ourselves. Therefore, we sometimes must debug our code remotely. Obviously, logging is critical to this. We keep sometimes hundreds of MB of logs and have archiving rules and a tool for users to collect them. Every layer of the system keeps it's own logs, and all logs have timestamps to milliseconds.
In our case we log all the data back and forth, and then every important decision the code makes. For example if it decides there is something wrong with incoming data, it must log that. Any action it decides to take must be logged. Any data that will be passed on to other layers/the outside world must be shown. Generally, whenever we forget to log some of this data we will later regret it ("why the hell is it ignoring that device state..."). We also log at startup, basically the whole system configuration so that we can reproduce it.
Callstacks when there is an exception can be very useful. However, a lot of "errors" (at least in our case) are not exceptions but rather unexpected data or behaviour. We rarely have a crash and in state-based systems a callstack doesn't tell you much about what's going on. So a callstack is not useful for all situations.
Other times, you just want logging to give you a clue where in the code it was so you can run up the debugger and step through it (you do know how to step through code in the debugger, right?). In that case, too much logging can just get in the way. It might be sufficient in a GUI or web app to say which screen/page and which button was clicked.
You'd hope users could report this kind of details, but not always: if the user is working in another language, in another country, with two layers of helpdesk between you and them, and they are busy doing other things when the problem occurs and only call in the issue an hour later, and the helpdesk takes a week to report it to you - you may just get a vague or even misleading report that no-one can remember when you ask questions. In those cases log files may be all you have to go on.
There is also a tradeoff between log detail and manageability. Besides the difficulty in reading long log files, having a lot of detail means maintaining a lot of extra code. It also means that log files can become unmanagably large. In our case those hundreds of MB of logs can be a huge problem for customers to send to us because they have low quality internet connections (small companies in Mexico for example).
For every expert, there is an equal and opposite expert. - Arthur C. Clarke
I find I usually end up with 3 levels of logging:
Normal operation, often with some notion of "Yes, I'm still running even though I haven't done anything else lately".
Details. Usually corresponding to processing steps.
Algorithm tracing. This includes things like logging SQL queries. This is usually only of interest to me.
...laura
After a few greps over the PHP code I'm working on at the moment, it turns out there's... no obvious log pattern at all. It's logging to the webserver error log in some places, logging to syslog in others, and using assert() randomly. Maybe I should've paid more attention to that bit while writing it.
I work on a large air traffic control system. Logging is a huge issue. Log files are collected centrally by a separate application. One important issue IMO is making the contents of your various log files meaningful to people who are not familiar with them.
If your system has objects of type A B and C which can be handled by different components of your system then your should make the logging system in those components print information about those objects in exactly the same way.
While you are at it, make the log format easily parsable by software. You don't want to be looking for a needle in a gigabyte size haystack of trace information without help from a tool which understands what it is looking for.
http://michaelsmith.id.au
Use different log levels for events, e.g debug, info, warning, error, fatal ...
This way the user can set a log level threshold above which messages will be ignored, so you don't have to plough through log files full of debug data, for example.
I will add lots of logging to debug a specific problem and then rip it out when the problem is fixed. Permanent logging includes run time problems like serial communication errors, file not found, etc. I like to make various logging functions switchable, so user input can be logged for example, but only when needed. Once a program is running well, it should only log data for dire exceptions, unless regular accounting logs are needed.
My online applications log to a database, not a text file. Multiple applications on different machines can log to the same database table. There's no need for "log rotation"; old entries can be summarized and purged by date on the live database. With appropriate indexed fields, you can find key log entries in huge log files very rapidly.
Even program faults are logged to the database. If the program crashes, the top-level exception handler catches the event, does a traceback, opens a fresh database connection, and logs the traceback.
Don't coddle weak programmers... it's survival of the fittest out here. Either they learn to nourish themselves from the ample teat of a stack dump, or they must perish. It is for the good of our civilization. I know this seems harsh, young Jedi, but it is the way of the Elders of Assembler, from the ancient Time Before OO. Now go Forth and code.
Okay, joking aside. Parent has a great point -- logging can generate incredible volumes of text and can form a remarkable bottleneck, especially on VM systems where your OS may not be the only one hitting the disk.
So take advantage of Log4J/Net's ability to log at different severity levels and make logging globally configurable so you can enable/disable entirely at runtime. I'd recommend you log the following : object creation, scarce resource allocation, recoverable failure/error conditions and unrecoverable failures. Preface each severity level with a unique label so you can grep for it later. Even at the most verbose level, you can then grep your output to see only what's of interest to you (e.g. "unrecoverable:...").
I set a "logging level" in my configuration file so that when I need a program log to be verbose I can make it so but for normal use I can keep the logs fairly light.
A log is only useful if it records information that is used. The problem is that sometimes everything is running fine and super verbose logging is a waste but then something in the environment changes and suddenly temporarily logging a lot of debugging information is useful. Setting a "logging level" in a configuration file lets me do light logging, verbose logging and just about anything between without having to recompile my projects.
The race isn't always to the swift... but that's the way to bet!
If you're using log4j, don't use multiple hosts to write to the same nfs filesystem file. You'll run into blocking issues and log4j doesn't handle them correctly. The nirvana of clustered app logging is an async JMS queue. You fire off the message and forget it. You don't wait for file handles.
It's not CPU that's at a premium, it's disk IO. And on virtualized machines (such as is extremely popular in corporations and hosting farms) where there might be four different OSs running on the same physical hardware, disk becomes a scarce resource very, very quickly. And not only does your virtualized server go to shit, it takes the others down with it since they can't get timely disk access, either.
It depends on the app you're writing -- is it a web app, a database app, a userspace program written in C, a Perl/Ruby script, or.. At work we created our own logging library in C to emit log events for different levels, e.g. informational, debugging, warnings, errors, fatal messages. We then have wrappers around that library so that languages like Ruby can access that logging library. But on hindsight I think I would've just used syslog if I had to start over. :)
Are we talking about production environment or a development environment? In production, we log our exceptions or when major changes happen, such as something being deleted. The goal there is to help our support team and customers nail down the problems when they arise. I don't know if LoC is the right measure for these things. Our UI (it's a web app) has little to no logging since errors just appear when the UI has a problem. So we can have hundreds of lines of code with no logging there. Our backend logs much more frequently. Basically every "catch" block has a call to the logger (log4net is what we use as well). Still, most of us prefer to hook a debugger up to the process during development and rarely ever go back through the logs. The only time the logs are frequently used prior to production is when we're running automated QA and we want to know what happened when a test case fails. In that case, when we compile with the debug flag, we log quite a bit and more frequently and log things that only our developers would understand.
I guess maybe the rule of thumb is, depending on the environment, generate logs that are targeted at the audience when things go wrong. I'm not convinced frequency of logs is a meaningful measure of appropriate logging.
EvilCON - Made Famous by
"I personally find that writing debug and informational messages about every 2 to 5 lines works well for debugging an issue" Learn to use a debugger. What you are doing is backwards. The logger is meant to help out when the application you are working on is being executed remotly and attaching a debugger isnt practical or desirable. There is no code in the world that I can think of that needs a log line after every two steps in a procedure.
I don't, save the rain forest, hug a tree, prevent deforestation, stop logging now!
Segmentation fault: core dumped
I wrote up a two-part piece on logging best and worst practices a while back. See Part I and Part II if you are interested.
1 line of logging per 200 lines of code is way too much. 2 in 5 lines is absolutely insane. I've seen way too many systems where the logging made it totally unusable. People just don't realize the costs of logging everything.
There's absolutely no way to document anything this verbose. Without documentation, the logging is useless.
with guru meditation errors of course!
I Need someone to rebuild a Digitech Digital Delay pedal for me....for me...for me...for me.
Comment removed based on user account deletion
Debugging is for people who write buggy code. Nuff said.
I personally put logging calls in the most used functions.
For example in my MySQL handler class I have put the logging code into the function that executes the SQL queries, which is the function that everything else relies on.
So when I need to debug I have a stack of SQL queries along with their error codes and error strings.
That's enough to figure out what went wrong.
My point is put the logging calls in the vital functions, the ones that do all the work. This will save you many code lines and processing cycles.
My logging is setup so I can quickly filter down to the type of data I want. It's more than just "information", "warning", and "error" - but by (cpp)file, module, etc. That way, if an issue arises, I can eliminate the cruft and see just what I need. Just takes planning.
Why can't all fpga/microcontroller manufacturers just release free optimizing compilers???
Syslog, of course.
I tend to do my debugging by inserting a lot of printf statements to indicate where in the program I currently am and the value of any critical variables at that time. As the output information is no longer needed (i.e. I fixed the bug it was attached to), I tend to cull out whatever isn't useful anymore. However, I tend to keep starting messages in function calls related to a routine I'm working on or making more than a trivial change to... since chances are, knowing me, I'm going to end up putting them back in there anyway once I create a new bug... and lets face it, it WILL happen.
Once I'm done, I go back, remove or comment out (usually just comment out) all the messages that have no redeeming value for a properly functioning program, and turn the rest into debug statements which print based on the debug level provided at execution time... or sometimes I use a mask to select which types of messages to display.
-Restil
Play with my webcams and lights here
Log4j supports selective logging. That means you can have info/debug/trace priority messages in place, but never see them in the log unless you explicitly enable extra logging for that class or package. You can do this at runtime, e.g., via something like 'chainsaw' (which attaches to a running process) or hooks in your UI.
Our policy is that logs are usually very quiet. Application startup/shutdown and not much more. But if there's a problem the debugging messages are already in place to let us peak into the system, even if it's been deployed to a production site.
BTW AOP is also great for this. You can configure logging interceptors that log activity when you're in a development environmnent, but easily removed in production. This is a natural approach when going from one layer to the next, e.g., when wrapping the DAO layer.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
I write code for Telecom test systems that need to run 24x7 processing highly varying requests from dozens of different client systems. Our system consists of dozens of different processes/components per host, with multiple hosts all invoking components on each other as needed (all via CORBA). There are very many paths that any request can take through our system.
In this environment we log VERY heavily since each request is close to unique and we need to be able to determine the path it took through the system, and why it did, and what happened, in the event of any bug report. Some of the haviest used modules can produce close to 1GB of log per day per host - upto a couple hundred lines of logging information per request per process that it passes thru. We use a custom printf-like log library written in C++ (that auto rotates the log files based on various criteria), a custom tail utility for dealing with the large log files (tail a log file from a given timestamp - done instantly via binary search on the timestamps) and a daily cron job to compress the older log files and move any older than 5 days off the production servers to someplace with more storage.
First, a well functioning app should really not be generating any log file except what the application itself needs (e.g. usage logs for a web server).
Second, instead of wading through reams of log output, use exceptions and assert statements so that you only generate log output when something has gone wrong.
Finally, use your IDE. Instead of trying to infer state by combing through logs, set break points where you catch an exception. This lets you traceback to see why you ended up with the exception without suffering much if any of a performance hit.
As we move to increasingly distributed systems, log files become less useful. For many distributed apps, you may not even have access to all of the various file systems where a log might be written. Best solution is to rely on highly modular and thoroughly debugged code.
Statesman
That should be about right...
Sometimes with printf style debugging, so much data comes out that it is not practical to use the logging in normal code. That said, I dump all data I want to stdout, and use AWK to massage the results. If you put in some kind of tags or markers, this can help simplify the log processing code significantly.
You don't need databases or anything structured, simply one stream per process. The process will run sequentially, so a sequential storage structure (ie a file) is the best matched to the data. Also, I avoid markups like XML and so on in favour of plain text, since it is easier to process the plain text with standard UNIX tools. They also tend to operate more efficiently since parsing is a matter of looking for whitespace or \n. Also, the same techniques work from any reasonable language, since they can all output text.
SJW n. One who posts facts.
You have to be logging a *lot* to dog down a system if you're writing to memory and dumping out to disk every 100000 lines or so (or on exit). As for viewing, how hard is it to filter written output in place?
Did I miss something?
Logging is for the weak, all my shit is rock solid, no need for logging.
I keep my fingers crossed and pray to God, asking him to tell me where the error in my code is. It works 100% of the time. (You deserve this response for answering such a silly question, get back to work!)
On the other hand, broken code hurts application performance, sometimes drastically.
I'm an SQA engineer with years of experience working with large scale enterprise systems. Generally speaking the cost of unexpected outages or data corruption outweighs the cost of hardware. In such systems the costs of deployment activity itself can be such that you'd rather pay for more hardware to support extremely verbose logging.
Sure, boneheaded logging can cause unnecessary performance hits, the obvious example being logging in a loop when logging at entry and exit would have sufficed. But that's not what we are really talking about here. You posited that you should do as little logging as practically possible, and I believe that you are wrong.
Log lots and log often. Just do so intelligently. Use a logging framework (log4j, log4net, log4perl etc) and set the priority appropriately. Only use ERROR for real errors (unexpected code paths or data), use WARN when a performance metric is hitting a soft limit (to warn you before you hit that hard limit), and use DEBUG to verbosely log anything else of general interest. Rarely you might also want to log in an extremely verbose manner data that wouldn't ordinarily be interesting, and this should be logged at a TRACE level. Generally speaking if this is the case then the code itself is due for a refactor. FATAL should normally be reserved for errors that prevent correct startup - generally if an application runs correctly at startup then any potential faults that you see and handle now become ERRORs because there is nearly always something better an application can do than log FATAL and exit. As the OP observed, if you have a potential fault that kills your application and you don't see and handle it then you don't have the opportunity to log FATAL anyway.
By using a logging framework, many logging pitfalls can be avoided because the framework itself provides well tested facilities. eg, time-stamping, log rotation, file-handle management etc. In addition, using a framework allows the operator to tune the logging on a very granular level. This allows for a trade off to be made where if a performance impact is noted in a well used class then its logging can be reduced at runtime. Sure, there is still a small performance impact because the logging framework has to do a "if (logMessage.logLevel >= loggingClass.logLevel ) then {...}" comparision, but in the scale of things that impact is tiny.
My profession is not about finding and fixing bugs. It is about understanding the software that is being delivered and deployed. It is about understanding what defects exist (or may exist) and the possible implications of those defects. It is about reducing the risk of defects through analysis. Analysis of the software's functionality, analysis of the software's performance, analysis of the processes used to produce the software itself. You will never be yelled at for releasing software with a well understood and documented defect, but the shit will hit the fan when you release major defects that are not understood.
Logging is an _invaluable_ tool in this analysis. You'd be a fool to not use it effectively.
Wow. This is a disturbing reminder of my University assignments, for which I forgot to remove those cout "No segfault yet!\n"; statements.
Truly, logging is an amateur pursuit, & a grownup Software Engineer utilises formal verification and.... *GASP*.... unit testing!!
Seriously, do the slashdot developers call a logging function somewhere with "Okay, we've started the comment routine!" in comments.pl?
I work on set top boxes, and not every platform we port to has a good debugger (hell, its been years since I've seen a good debugger). Our logging system is all in house; multiple "levels" for each log statment, (noise,information,warnings,fatals,etc), with each module creating its own log id and setting its "preference level". It works well, but:
1) Useless logs.
Engineers not taking the time to write logs that are useful. "Got to here", "Value=1", etc. A few of us write enum-to-string functions and pass them to the logging system for cleaner output.
2) Running at the speed of 115200.
We've only got a serial port most times, with multiple threads trying to access it, there's got to be some synchronization, and this generally affects threads of any priority. Using a logger that caches and outputs logs at its own pace is nice.
this is my sig, there are many like it, but this one is mine.
interesting topic, i have been working with a new installation of a portfolio accounting system. The developers don't talk to their support personel, who can't refer the end user (me) to how to crank up the logs to find issues and thus you find core crap laying around daily or extra process that they tell you just to kill and move on. think about your end result, software out there in the wild that is supportable, not mysterious to the end user. nice product, if it was free, unfortunetly it's ubuer expensive and i feel, is a piece of...
Having spent many years as a Systems Administrator, I would argue that the most important part of logging is to make sure that it is in a format and location that other people can use. People won't use the logs if they don't know where they are. And, if you're developing for Windows, I would go further and say that the only place you should be considering logging data to is the EventVwr application. I'm not sure how the java stuff goes, but if you're developing in .NET, ditch the log4net application, and stick with the System.Diagnostic classes.
People have spent a lot of time and effort building logging systems and installing systems to monitor those logging systems. Don't try to reinvent that particular wheel.
First - let me give you my own perspective. I recommend having each subsystem log in such a fashion that you can easily grep to include or ignore that subsystem. For example for one package the following LVLx messages were the first four characters as follows:
LVL1 - basic startup and shutdown info ( a few lines per run)
LVL2 - Interactions with the database
LVL3 - Interactions with the file system
LVL4 - Detailed database interactions including each sql statement
LVL5 - amazingly verbose debug information including memory and variable allocations
In almost all cases, I recommend being able to set each level on or off. Your sysadmin (maybe yourself) will appreciate that ability.
If appropriate, I recommend an 'audit' record after each completed or aborted transaction. EG - after every order or every user change or whatever is important for accountability / business activity monitoring purposes.
This is the original question .
Like other posters said, there's no one size fits all. But, I've pulled together a few tools and practices that seem to be pretty adaptable...
Be sure to have all your logging go through one routine--don't sprinkle your code with printf's. There's more logic than you might think that you might need to add to logging: sending the log to different places, changing it for debug/release builds, and the all important consolidating reputed messages into a "Previous message repeats 29302 times."
Once you do that, define two macros (yeah, I'm a C programmer), one that vanishes in release builds and one that doesn't.
One trick that's worked well is to define a bitmask with one bit for each of the different things your program does. Then, add this as the first parameter to all your log calls. At runtime, set the mask to the things you're interested in, and have your log routine only print/record/send ones that match. So, if I'm trying to troubleshoot, say, something dropping messages at an end-user site, I can have them set the runtime flag to NETWORK_RECEIVE|BUFFER_MANAGEMENT, and only relevant log messages show up. This is really useful, because it lets you put lots and lots of trace in your code but keeps log files manageable. (Of course, the same log message might be relevant to a few different things. This is handled by or'ing together flags in the log call.)
If it's code that's going to run for a long time, or pretty much if you're leaving any trace in on a shipping version, be sure to put some logic in to keep an eye on the size of the log file. Sure, everyone's got lots of disk these days, but you still feel pretty dumb when a production machine crashes after a year because the log file got to 100's of MB...
With OpenSolaris it is no longer necessary to do the actual logging yourself. Through a combination of truss (which can do system calls as well as library calls) and dtrace, the actual mechanism of logging should not be one the programmer needs to worry about.
With dtrace, the application only needs to define points of interest in the flow of the program and export the relevant data. The collection (or logging) of data is done when it is required, so there is no performance impact when it is not in use.
Inside the kernel, we have "fbt" (function-boundary-tracepoint) for every function - modulo optimization.
An example of what an application can do is MySQL. See here for more details:
http://blogs.sun.com/luojiach/entry/dtrace_integration_with_mysql_5
This doesn't take care of every problem where logging is useful, but it does take care of a lot of them.
Hey you.
Let's have sex.
DRM: Terminator crops for your mind!
I've seen at least one example where it looked like the programmer added logging when trying to debug a problem. Then, apparently on the assumption that it was "useful once, so it might be useful again", the logging was left in. There was a compile-time AND a runtime switch to toggle this logging. It tended to be ON all the time, cluttering the logs with needless information, and making the code look ugly. In theory, compile-time switches can eliminate the performance issue; but some of the logs depended on values that had to be computed before-hand. The result? That code had to be left in, even if the log wasn't there. It would sometimes generate a compiler warning when compiling without logging, since those variables were initialized but otherwise un-used. In some cases, macros were used to dodge around this.
The rememdy to this anti-pattern? When debugging a specific issue, I prefixed the log with the word DIAGNOSTIC in all caps, just like that. Thus, I knew that such logs should be removed as soon as the specific bug was fixed. A grep for DIAGNOSTIC would locate all these messages. The remaining logs were gradually removed, and things that were supposed to be logged under normal operation were written into the program specifications. These could be toggled at runtime via the verbose options, an option that needs to be distinguished from logs generated specificly for programmers working on specific issues.
Code to debug a specific issue should never, I repeat, NEVER find its way into a release. Many will argue that it's a waste to throw away the code that instruments the bug; but I've found this worry to be unfounded (I'm reminded of another internet rant about "how to properly delete code", and I know I've been guilty of leaving old code around, so I'm not claiming purity on any of these issues).
If a problem is likely to occur often in code that does not have bugs (e.g., a client that can't connect to a server because the server is down) then of course that should be logged, otherwise you aren't going to know why the client has no data. That just cycles back to what I said before, namely that you should only log what's specified in the requirements for logging.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
I would argue that distributed systems make logging more important. An IDE is great when you can reproduce the error in development, but it's not always possible. A log can answer questions like "which node is exhibiting the problem?", "what actions did the user perform?", "what did the message I sent look like?". The ability to enable verbose logging is a lifesaver when troubleshooting complex, distributed software bugs.
So I either use Log::Dispatch or Log4perl (a Perl port of Log4j). Both are great.
Use log4j, and use the levels you can set. It depends on what you need, but I use info for audit trail kind of stuff. And for performance put conditionals around logging for debug and other cpu hits. You can write to syslog using log4j too, for a nice central backup. I don't know where all this chatter about vm's and disk being a bottleneck come from, I have never seen that, it's just sequential writing and I do gigs of it, no issue. But if it is, redirect to a server made to store it.
The CPAN module Log::Log4perl is a great tool for logging -- it means you can stick in plenty of debug statements, and dial them up for debugging, then dial them down in Production.
This module uses several message levels; in descending order of importance they are FATAL, ERROR, WARN, INFO, DEBUG and TRACE. It's possible to log messages to files, a screen, or even to an E-Mail message.
The real strength of this method is that you don't need a 'debug' version of the code -- all configuration is done externally, which means you can turn logging on for a Production problem, run your test, and turn it off again, and look at the log files off-line.
use Event Tracing For Windows.
I use a custom logger, that doesn't log via levels, but rather by text keys that are loaded from a config file at runtime. Logging code looks like: log( "dbg", "blah blah" );
log( "err", "some problem" );
log( "file load", "file not found" );
etc
This way I can log as often as I like while keeping log entries only to the information I'm interested in simply by making changes to the log config file.
(1) Log lots of information when you are debugging. Then, when you are done debugging that particular problem, TAKE THAT LOGGING CODE OUT.
If you find it tedious to find your debug log lines while debugging, it is a sign that you are logging too much everyday stuff! In the meantime, surround your debug log lines with multiple lines of "===================" or the like. Then, of course, take those out too.
(2) On an everyday basis, log only UNEXPECTED conditions. After all: if it was expected, then your code handled it, right? That means you have to plan for unexpected things to occur. You should be doing that anyway.
(3) If it is a very important issue, you should have an emailer set up, not just a logger. That way, you will be notified of urgent things come up.
(4) You should be doing tests on your code, not just logging. Logs are for when problems occur. Prevent them from occurring in the first place by writing tests. If you are not familiar with that practice, look up "test driven development" on the internet. You will find more than you want to know.
... you are of the opposite sex, have no significant communicable diseases, and pay your own way.
Checkout Bunion's Blue Ox Lumberjack Logging Solutions for industrial strength Logging functionality.
http://en.wikipedia.org/wiki/Paul_Bunyan_(lumberjack)
1. Distinguish between serious -operational- issues and other issues. The sysadmin doesn't need to know that you had a pointer problem; there's nothing he can do about it. He does, however, want to know that a message was received and the appropriate action taken. Or that a connection was attempted but failed.
2. Be grep freindly. The first log entry related to a particular activity should have an ID of some sort in the log message which is then included in every additional log entry associated with that activity.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
I used a system where log messages were assigned arbitrary tags which were themselves strings. The communications subsystem got the "COM" tag, the memory manager got the "MM" tag, etc. Log levels were just tags ("ERROR", "FATAL", etc). The system supported 32 output channels, each channel was associated with some output stream (file, pipe, etc) and each had a Boolean expression which filtered what went to the channel ("FATAL" or "ERROR" or "ALERT" and "MM" or "COM"). A channel could also be assigned a "lifetime" so that output to the channel only lasted for some period of time. Each channel could be setup as the daemon was running so it was easy to start debug messages going to a particular console, memory manager messages to a file, alerts to a pager app., etc. It was great when remote accessing a running system and diagnosing a client's issue. When you are all done you just shutdown the channel or let the channel expire. End of output and no more wasted I/O.
Good general advice in logging is to support different verbosity levels. Provide the ability to change the log verbosity as a command line switch or as a signal the process can receive.
Then if your log is too cluttered, reduce verbosity and try again. If it isn't telling you what you need to know, increase verbosity.
One of the coolest things I ever saw was a tape recorder that starts recording 30 seconds before you press "record". Simple concept -- just continuously record to a 30-second buffer.
So if it's truly disk/network that's at a premium, why not start logging just before you need it? Log at level "EVERYTHING" up into a circular buffer, and only dump to disk or network when there's an error?
Support for syslog, syslogng, windows event log, log to file, console, RDBMS.
Messages should be routable to multiple sources based on defaults or logging call.
It should have standardized message severity levels and parameterization for location, session context..etc.
It should support threshold supression for classes of messages to prevent log spamming.
It should provide a transactional approach where multiple writers are posting to the same log file at once the actual output should be coherent by transaction.
It should provide a queuing option to background delay and failure prone posting to disk, network, RDBMS...etc and provide backup logging paths.. It should not introduce delay in async environments.
It should provide buffering options to enable or disable flush to disk and buffered writes for higher performance logging.
It should be thread safe and optionally never dynamically allocate memory.
It should support message id tagging for message localization.
It should provide bindings to post SNMP traps.
Unfortunately TFA falls far short of what I think is a reasonable base-line feature set for a logging library. I wonder if there are any public domain/LGPL libraries out there that come close?
To me, it's important to cover both the internal and external cases in any scheme.
For the internal, it's important that your organization is capable of finding owners for failures/bugs quickly. So when a developer hits an error they don't understand or your BVTs break, an owner can be identified. Nine times out of ten, they can get you a solution in an hour or two once they've seen the error. In addition, an automation framework should work closely with your logging to bubble up the relevant errors. That can save a LOT of time.
For the external case, it's important that the errors are professional. A casual user won't dig into logs. But technical users (admins, etc) will rely on your output. The logs should be search-engine friendly. I recommend unique error codes for any log output. Localization is not always appropriate as it can cause problems for non-english searches. If support needs to involve developers, it also helps if these logs are in the developer's native language.
It might come back to haunt you later.
The society for a thought-free internet welcomes you.
I work on my company's billing system and we use both database and file logs. We process millions of transactions a month so it's important to not only have a detailed audit trail of every transaction but also be able to keep track of any errors and be able to recover transactions from logs in the case of something unexpected happening.
For file logs we use Log4Net and I recommend it. It allows us to specify log statements as Warning, Debug, Error, or Fatal. We have it set up so that all types are logged in the file while any Error or Fatal log statement is emailed to the engineers so we can immediately look into the situation. I don't believe in having too many log statements as it does nothing but make finding useful information difficult. As a general rule we log as much information as possible about each transaction as it comes in and out of each part of our system as well any major changes to that data that come from a part of processing it.
Database logs are extremely useful because they can be queried unlike file logs. This makes finding the information you need much easier. In our case however the db logs only contain information about the transaction and not the processing of it. Therefore both the db and file logs are necessary.
Having two types of logs also acts as a fail safe. If our db goes down we still have file logs. If there is no trace of something in the file logs we can check the db.
Now with this level of logging comes other issues such as archiving logs and dealing with db log tables that are too big but that's a topic for a whole other discussion.
I frequently use log4j or it's non-java derivatives in different languages. No need to reinvent the wheel when log4j properly handles logging with probably the most flexibility and performance anyone could ask for.
That being said, you can add what ever logging you want as long as you do it outside the application code, such as through AOP (if you don't know what that is then google is your new best friend).
Logging through external means has a number of benefits. First you application code is relieved from unnecessary clutter. Second the logging can be added or removed as fit for the environment with no need for any runtime checking, which is perfect for turning of all logging in production environments. And lastly it enforces good coding practices as it makes sure people break up code in a way that makes external logging possible, which is how code should be written anyway.
The majority of all logging, dare I say all useful logging, is easy to summarize.
This is perfect for function/method interception since it comes down to something more specific.
So simply add the first logging to an interceptor that operates before the functions you want to log, and the second after the function.
If you find yourself needing detail inside the function then you need to break the function up into sub routines, so you can use this generic logging on the sub routines instead, or as well as.
I've used lots of open source software where when you increase the debug logging level the verbosity and/or additional information is given as the level increases.
I guess if you needed to isolate certain output then you could output the debug level of the given log entry, then pipe to grep to search?
I develop for a large ERP application where we allow the user to control the logging level. In normal operation, we do minimal logging to save disk space. When a problem arises, the user can choose from several levels of detail in logging. Logging is an expected part of any code we write and it enables both our support department and development team to debug issues quickly.
At the most detailed level for a server process, we show SQL statements and number of rows affected with every iteration in any cursors (which we avoid whenever possible anyway) and output current values of relevant variables. For server processes, logging level can be set for the entire datasource, and overridden per execution of the server process.
Client side logging is used less frequently but can be set via a registry key by our support staff when trying to debug problems. Client side logging will output XML of any datatables in memory at relevant points, relevant variable values, and call stacks.
Once you have a framework for logging created and get in the habit of using it, it requires very little extra work and is well worth the effort when things go awry.For the average server process or screen we do, adding logging logic probably adds only a couple of hours to the project, but over the lifetime of the product, it will save us dozens of hours or more in wasted debugging time.
The logging is useful testing phase as well.
The important thing is to create a standardized logging system for your product that is easy for developers to use so they don't have to spend much time thinking about it and it doesn't obfuscate the logic in the code. The company I work for requires developers to provide support for some of the software they write, and this seems to provide the necessary motivation for us to use our logging mechanisms to their full potential.
If you're logging for debugging purposes I suggest setting your application up so that you can change verbosity on the fly. Combine that with a good naming convention within log4j and you could increase the detail in an area of interest while filtering out the rest.
Some parts of your application might not need constant logging. Put the logging code in there anyway and use filtering to keep it from ending up in the log. If you ever run into problems just remove the filter.
If you are communicating with other systems figure out how to tap into that conversation. They might work fine now but sooner or later they'll break.
In the root of your application put a document labeled "HOW TO READ THE LOGS FOR TROUBLESHOOTING". Make sure the filename stands out from all the other files in there. Explain where and how to do the things mentioned above. It might be someone else than you that has to do the troubleshooting. Or you simply forgot how to.
Real men sling their code from the hip then flex in a mirror after they're done. And if I find a bug, I pry it out with tweezers I've sterilized in a camp fire.
This is something I have recently been thinking about quite a lot. I am used to being able to easily debug software and watching what it is doing, and I had always considered logging as something you do because...well, I actually didn't really know why.
Now that I have done commercial development on larger project for a while, I have developed a somewhat better understanding of what you want logging for. I am still figuring it all out, but these are my current rules:
- Log event-related information. When your program receives input that causes it to start doing something, log that. If there is something strange or wrong about the input, log that. Once you stop processing (because you're done, or because you can go no further), log that. This way, you can see what your program has been doing. You can see that it received input (very important), you can tell if your program thought the input was correct or not, and you can see that it finished processing the input, and whether processing was successful or not.
- Provide for the option to log information about decisions your program makes. This is particularly useful for finding out why your program did not take action on something. This shouldn't always be enabled (unless, perhaps, for high-level decisions), because it would make logging overly verbose, but you should be able to enable it (preferably, focused on a specific part of the program) if you think the program is not behaving as it should.
- Last but not least, log every error condition. This should be obvious, but it is very important, so it's worth stating explicitly. You want to know if something went wrong, you want to know what went wrong, and you want to know where in your code it went wrong. Backtraces are very helpful here.
A large part of this is actually Cover Your Ass. If you log all events, and you didn't log anything, you can tell the Exalted User that the program didn't do anything, because it didn't receive anything from said user. If you validate input and log errors, even non-serious ones, you can tell the user "try with valid input", while you think about the problem and look through the code, seeing if anything is wrong on your end. If you log all errors, you can say "well, it says here there was a problem with the database connection, have a look at that", or "hmm, that value wasn't filled in, but should have been. are you sure you filled out all required fields?"
The above might sound disrespectful, but it really isn't. Very often, the users actually are doing something wrong, and they would be happy to know how they should do things to make it work, if only you could tell them. Logging everything you need to establish exactly where the problem is is invaluable for this. Your logging should tell you if the error lies inside your program or outside it. This makes your life easier, but your users', as well.
Finally, I will add that I very much agree with AaronLawrence's comment.
Please correct me if I got my facts wrong.
Logging should be fit to your needs - anyone who tells you that your logging is wrong is probably wrong. For example, I took a lot of flak in my company for having *extremely* verbose logs. Our support, who like to look at logs, blamed me for making the logs way too huge. I then got the linux server guys lecturing me about the proper way to log - I apparently should only log critical errors, and I should have several different log levels, right? WRONG. The product in question is a plugin, and relies on very flaky API's. It is subject to interference from other plugins and bugs in the host application itself. Having several log levels only lead to too much back-and-forth between support and customer (ok, now try THIS log level. ok, now try THIS log level. ok try...). So, for my needs, where anything can go wrong at any time, super verbose, scope-indented logs were absolutely necessary. For other needs, only critical errors need be logged. And every need in between has justification as well. There is absolutely no rule, and as long as your logging meets your needs, no one has any business telling you how to do what they don't understand.
First question is: who is the logging for? Sometimes the logging is for the customer, sometimes it is for support, often it's for the developers, and often it's for some combination of these. If it's for the customers, the readability, volume, and performance are all considerations. But if it's for the devs, requirements can vary greatly. Some people here say, "what about CPU" and others say "what about disk space?" If I'm logging for an app on a 5 petabyte system that only runs a relatively small chunk of code once per day, I'll go ahead and log every line if that helps in some way. If I'm logging for an embedded app with about 32k of free space, I'm going to log very differently.
In my previous office, we wrote and deployed real-time, mission-critical applications the likes of core-banking middlewares and ATM controllers, to name a few. Once deployed and set live on live banking servers, generated logs from those applications were the only means of figuring out and debugging, when something went wrong, what went wrong, how, and when. In fact, it was standard procedure to follow for the banks' IT staff when reporting any problems they encountered to open a support ticket, and send in logs.
There is no easy and sure way to determine what in the application needs to be logged. Like many things with software development, it comes through experience. As the products matured over the years, we, back in the previous office, realised what the bare minimum set was in terms of what should be logged. And that naturally helped in developing a simple pattern for what needed to be logged when a diverse set of people added extensions to the applications in terms of modules, etc.
I restrict mine to stop global warming.
I've been using log4net and log4j for years. Generally:
I log info at the start of every function logging all parameters passed it.
I log info at the end of every function logging return values if applicable
I log errors and exceptions at a high level
I NEVER trap the exception entirely, every tier (including the end user) needs to be aware that an error occurred, be it their fault or the programmers.
just my 2 cents.
Exceptions
All exceptions should be handled, it is never acceptable to simply print the exception message and stack trace. Exceptions should be dealt and corrective or informative action taken to highlight the issue.
For debugging purposes the stack trace should be logged at the final destination of the exception or at whenever the exception is modified.
Logging
If a class has unconditional logging, then it should be updated to use log4j. A case of unconditional logging is where there are System.out.println() in the code with no conditions surrounding the call.
Logging should always be performed at method exit and entry as follows:
- Log entry
- Log arguments
- Log exit
- Log return values
The occurrence of exceptions and the stack trace should also be logged as info level items.
Logging calls should be wrapped in enablement checks so that arguments do not get unnecessarily evaluated for example:
If (getLogger().isDebugEnabled() { :"+ theArgValue1); :"+ theArgValue2);
getLogger().debug("Entering - method - argList");
getLogger().debug("Arg1
getLogger().debug("Arg2
}
Assertions
Assertions should be used in the code to verify that the expected results have occurred. Assertions should be used as liberally as possible.
Standard Assertions should be performed at method entry and exit; these methods should validate the pre and post-conditions for the method. All arguments should be checked for validity as should the return values. Similarly the state of the object and broader system should be checked as appropriate on both method entry and exit, for example if a file is open.
Note
Please note that this is in using log4j where the logging can be turned off an on at different levels of the code during runtime.
Maybe it's just me, but one thing I've noticed is that new programmers tend to use very elaborate and extensive logging code with logging calls that take a zillion parameters like log level, error type, error code, short message, long message, so-one and so-on. The experienced programmers seem to use something like fprintf(stderr," log message... . That combined with maybe a few compile time macros. Of course the their code may just need less logging anyway.
Firstly I agree with a previous poster that using a debugger is better than logging for most development work. Single step through code to make sure its doing what you expect.
At the risk of exposing my age, I'm also a big proponent of desk checking code and algorithms. (Yes, we had limited mainframe CPU minutes to run our card deck and paper tape applications in lab assignments in college). Use a design review and code walk through process to explain yourself. It's amazing what you'll find as your colleagues challenge you on design decisions and you have to defend it.
With good code structure, this should provide a reasonable level of logging so you can determine where something might have gone wrong or gotten stuck in production or test out of reach of debuggers. Sometimes, bugs are timing related or only occur during stress testing so logging is the way to go as a modern replacement for the hex crash dump. You should be able to enable/disable logging with an option for verbosity. You can log to a text file or to something like the windows event log.
Spy utilities are also useful logging and debugging tools. There are tools built into ODBC for logging transactions client side or tools on the database servers. For SOA there are SOAP spies. I use Mindreef. Another option is Altova.
Moving forward, we're building workflow systems using FileNet Visual Workflow (or P8) or BPEL (Oracle, Active Endpoints, etc). In these cases we fork asynch child messaging workflows that log events to a database without holding up the main workflow process. These are inserted at key points in the main workflow and usually double as process monitoring points used in various process management reports.
Finally, debugging, logging and spy tools are no substitute for poor application and code design. Include module block comments and descriptive variable names at a minimum, and a sprinkling of line comments for the tricky parts.
You can put lipstick on a pig - you will still have a pig.
Comment removed based on user account deletion
As an OS developer, the type of logging done is very dependent on where in the OS the logging is going to occur. In many cases, the rate of logging events can be extremely high - exceeding millions of events per second (think of logging operations on OS semaphores on very large SMP systems). Other tracing events (such as in the IO subsystem) events are occurring at more reasonable rates (1,000s of events per second).
For logging high frequency events in the OS, logging to disk is typically not a feasible option. For these events, we log data into ring buffers in memory. The records in memory are typically fixed length. An atomic increment operation is is required to obtain an index in which to write the log entry. A small header typically consists of a time stamp, a CPU id, and a record type. The rest of the data could be either formatted text or binary data. In the case of binary data - some additional display code is required to format or analyze the data. Under normal conditions, this tracing is typically not turned on - the performance impact is just too high. The tracing is usually turned on during development testing and possibly beta testing or when attempting to reproduce a bug where the trace data might help. Tracing is turned on typically by allocating some memory that contains a pointer to the trace data, and this memory has a bit that indicates if tracing is turned on. If the structure is not allocated or the tracing bit is off, then no tracing occurs. Thus the test to determine if tracing is on is at most two trivial read operations - make sure this data is in a cache lines that contains no data that is written. Most of these reads will then be satisfied from cache and thus be very fast so the performance impact with tracing turned off is very minimal.
For the ring buffer itself - the size dictates how much data is retained. In most cases, only a few megabytes of trace data is allocated. However, in some of the high frequency cases where a specific problem is not noticed within the OS for some time - 20-30GB of memory has been used for trace data. Note that this is physically allocated memory and not something that is pageable so there the system needs to have this much available free memory. In these cases, tools to search the trace data are also necessary.
Being an OS debug tool - if tracing is turned on and the OS crashes - the trace data is logged in the crash dump and accessible (our tools can format this data from the dump file). Should the bug being tracked not result in an OS crash, then when software notices the bug, the bit indicating that tracing is turned on can be cleared thus stopping the trace data as soon as the problem was noticed and hopefully preserving useful data in the ring buffer.
With numerous trace facilities and then many options within each facility - only tracing that is needed to debug an issue needs be turned on. As this code is shipped with the OS, should a customer be having an issue and the issue can not be reproduced in house, it is often a great benefit to have the customer turn on this tracing.
With a trace routine that is basically a printf to memory - should some additional trace code be required, it is easy to go add addition trace code to the OS and rebuild the necessary components and reboot the system - for my specific OS, this can be accomplished in just a few minutes.
The above tracing mechanisms have been very valuable to debugging. This type of tracing can also provide quite a bit of insight into how the OS operates (especially for large SMP environments) and provide performance data on the system. In the semaphore example, if the semaphore address is recorded along with information on when the semaphore is acquired and released, then reporting tools can be produced that show the percentage of time a particular semaphore is held. If CPUs are spinning on a semaphore and that event is also recorded - then the ability to report the percentage of time CPUs are waiting on semaphores is also pos
1 line of logging per 200 lines of code is way too much. 2 in 5 lines is absolutely insane. I've seen way too many systems where the logging made it totally unusable. People just don't realize the costs of logging everything.
There's absolutely no way to document anything this verbose. Without documentation, the logging is useless.
Huh?
Depends entirely on logic, not generalized statements. If you're into a potentially problematic section of code, 5-10 log statements per 100 lines may not be enough. If you are in a section of code that is verbose code but not much to go wrong, the logging statements should be far fewer. Good logs is essential to debugging and problem determination and more is generally better than not enough so generalization is not going to fit here.
And in the code, just because there is a log statement does not mean it generates logs, add filters to the approriate level,
isn't that hard to implement is it?
You can refer to my project at http://realeyes.sourceforge.net/ for details, but essentially, I make the amount of logging configurable. Each message is defined with a level, such as CRIT, ERR, WARN, INFO, etc. Then in the config file, the user can define whether or not to receive INFO and/or WARN messages. That way, when the app is first run, everything can be logged to be sure that it is working correctly. Later, the logging can be reduced by ignoring INFO and even WARN messages.
Later . . . Jim
I often wonder what my peers would say about the logging I implemented on the system I've maintained and ported. Some psuedocode below:
bool Function1 (string s, int i, ref ERRORLIST e)
{
try
{
if (!Function2(s, i, e))
{
throw new Exception("Call to Function2() failed.");
}
return true;
}
catch (Exception f)
{
STATICDB.Log("who", f.Message, "Function1", System.DateTime.Now);
if (e != null)
{
e.Add("who", f.Message, "Function1", System.DateTime.Now);
}
return false;
}
}
The point here is that the exceptions are logged to a database, but also to a list that travels back up to the caller. If this error happens pretty deep, the list presents a chain of events that caused it. It's an optional argument if you don't want the user to see anything but a failure, but we've found it helps them help us figure out what the hell caused whatever happened and, in the case of business rules, what we need to fix or accomodate.
The specific scheme doesn't matter, but you want to log when certain conditions are true.
Something lime this is at the heart of many logging schemes:
Have a function LOG(flag,text).
LOG compares flag with a global/static variable CURRENT_FLAGS which is set during initialization or perhaps in response to special log-level-setting commands. If flag is in the currently-active flags, log the text, otherwise, discard it.
Typically, all logging that is debug-only will be surrounded by compile-time "#if DEBUG/#endif" or equivalent, so they aren't even compiled into non-debug builds.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Some comments, above, seem to say that every app should produce Mitchner-novels-per-second of debug and log. Others seem to go to the opposite extreme. Both are right, and both are wrong.
There are some applications, and some portions of most apps, that have extreme performance issues, and for which excessive logging is likely to produce a problem in production. And, there are some applications, and some portions of most applications, for which very detailed logging is needed to meet specific diagnostic or auditing requirements. And, there is a lot of gray area in between those extremes.
We use a variable approach. We make it very easy to code a "debug" or "log" output -- It's just a one line call. And, those routines have the first parameter for the "level". A separate call is available to set the desired level for the process. Level 0 message are reserved for major events and severe errors. Level 1 is for intermediate level events and warnings. Level 2 for minor events and warnings. Debug output starts at level 3 or 4. Inner-loop debug output is at level 5 or 6. At level 5 or 6 you get the proverbial Mitchner-novel-per-second. We generally use level 4 on for dev testing, and level 0 or 1 on production systems.
Efficiency is a key, both in execution and in coding. The code is small and easy to write -- only one line of code per message. And, the logging routine itself is very efficient in the case where it decides not to print the message because the level is higher than the current per-process level. A message that is not printed consumes just a few machine instructions.
In new code, we typically have a logging call per three or four statements. In difficult algorithms, maybe more. In straight-forward byte-moving code, maybe less. As code becomes more mature, and we become more confident that it is right, we might comment-out some or even most of those calls. When a piece of code is under investigation, and the developer doesn't quite see the problem yet, I often advise him/her to add more debug logging to that area.
There's no need to clutter up source code with logging statements for debugging. Just trace appropriate functions as needed.
A long time ago (in 'Net time anyway), there was a Slashdot article regarding what IBM could do to help the open-source world, now that they are heavily into Linux. Here's one place they may be able to help, through experience if not actual code. On z/OS operating systems, there are two (or three) logging architectures: there are instructions and supervisor calls for low-level tracing (Generalized Trace Facility) which is a high-overhead debugging tool. There's the System Management Facility, which is a software standard for logging things like utilization data, events such as file open/close (with I/O or media error statistics), etcetera. SMF would be the closest to 'normal' logging. There's then a 'System Logger' facility which handles log files of all sorts, cycling them, etc. This kind of standardized logging architecture might be a good thing for Linux, also.
You should be able to dial logging up and down. Use log levels to this advantage. If you are in C++ use compiler pragmas to remove trace logging statements. When doing trace logging you want very verbose output but that's only helpful to you. Think about the application as it ages in the hands of an administrator.
If your application pumps the log full of messages all the time eventually your administrators will start ignoring the logs. They'll be information saturated and unable to distinguish the errors from normal operations. This is bad. In production only log problems and provide plenty all the context you can when you do log.
Your administrators are users too and they only have so much time in a day. You want to only complain to them when something very bad has happened. You want them to sit up and pay attention when your application burps out a log.
Obviously, I've presented the most extreme case... there are points on this continuum of silent to verbose that make sense for different situations. Use your judgment and be mindful of your administrator's "information budget".
[signature]
It's the only way to fly! It's like linked lists or quicksort implementations... why reuse a well-tested, properly engineered implementation when you can build your own from scratch?
At least in Windows XP, I have found that the TraceMessage() function seems to run nicely with the MessageFlags parameter set to zero, meaning that only the USHORT MessageNumber identifies your message, thus minimizing log-file-size. But if you're the only app logging to your ETW session then 64k message-types are enough, why would I need a 16-byte GUID in each message ?! By comparison only TraceEvent() was available in Win2K, significantly more overhead. I also briefly examined EventWrite()( VISTA ) and it seemed way more bloated than even TraceEvent(). According to MSDN TraceMessage() continues to be officially supported in Vista and server-2008, so there is still a way to do fairly lightweight tracing. It's definitely nice to have these kernel-logging-sessions just keep on ticking while your apps crash.
that way you can see exactly where the user was at when it crashed.
For the last ten fifteen years, I have been big on debug code writing html pages into the web root so that the developer can locally browse the debug data. Emitting code with links makes it easy to get around. I have used this extensively and it works very nicely.