Software Logging Schemes?
MySkippy writes "I've been a software engineer for just over 10 years, and I've seen a lot of different styles of logging in the applications I've worked on. Some were extremely verbose — about 1 logging line for every 2 lines of code. Others were very lacking, with maybe 1 line in 200 devoted to logging. I personally find that writing debug and informational messages about every 2 to 5 lines works well for debugging an issue, but can become cumbersome when reading through a log for analysis. I like to write warning messages when thresholds or limits are being approached — these tend to be infrequent. I log errors whenever I catch one (but I've never put a 'fatal' message in my code, because if it's truly a fatal error I probably didn't catch it). Recently I came across log4j and log4net and have begun using them both. That brings me to my question: how do the coders on Slashdot handle logging in their code?"
I use Jakarta Commons Logging...
As usual, "it depends" on what you are trying to achieve. Nobody can give you a blanket recommendation. But I guess in general: the log files need to give you enough information, that can't be got in other ways, to solve any problem that comes up.
We have a realtime product that goes all over the world and talks to hardware that we can't always get access to ourselves. Therefore, we sometimes must debug our code remotely. Obviously, logging is critical to this. We keep sometimes hundreds of MB of logs and have archiving rules and a tool for users to collect them. Every layer of the system keeps it's own logs, and all logs have timestamps to milliseconds.
In our case we log all the data back and forth, and then every important decision the code makes. For example if it decides there is something wrong with incoming data, it must log that. Any action it decides to take must be logged. Any data that will be passed on to other layers/the outside world must be shown. Generally, whenever we forget to log some of this data we will later regret it ("why the hell is it ignoring that device state..."). We also log at startup, basically the whole system configuration so that we can reproduce it.
Callstacks when there is an exception can be very useful. However, a lot of "errors" (at least in our case) are not exceptions but rather unexpected data or behaviour. We rarely have a crash and in state-based systems a callstack doesn't tell you much about what's going on. So a callstack is not useful for all situations.
Other times, you just want logging to give you a clue where in the code it was so you can run up the debugger and step through it (you do know how to step through code in the debugger, right?). In that case, too much logging can just get in the way. It might be sufficient in a GUI or web app to say which screen/page and which button was clicked.
You'd hope users could report this kind of details, but not always: if the user is working in another language, in another country, with two layers of helpdesk between you and them, and they are busy doing other things when the problem occurs and only call in the issue an hour later, and the helpdesk takes a week to report it to you - you may just get a vague or even misleading report that no-one can remember when you ask questions. In those cases log files may be all you have to go on.
There is also a tradeoff between log detail and manageability. Besides the difficulty in reading long log files, having a lot of detail means maintaining a lot of extra code. It also means that log files can become unmanagably large. In our case those hundreds of MB of logs can be a huge problem for customers to send to us because they have low quality internet connections (small companies in Mexico for example).
For every expert, there is an equal and opposite expert. - Arthur C. Clarke
I find I usually end up with 3 levels of logging:
Normal operation, often with some notion of "Yes, I'm still running even though I haven't done anything else lately".
Details. Usually corresponding to processing steps.
Algorithm tracing. This includes things like logging SQL queries. This is usually only of interest to me.
...laura
After a few greps over the PHP code I'm working on at the moment, it turns out there's... no obvious log pattern at all. It's logging to the webserver error log in some places, logging to syslog in others, and using assert() randomly. Maybe I should've paid more attention to that bit while writing it.
I work on a large air traffic control system. Logging is a huge issue. Log files are collected centrally by a separate application. One important issue IMO is making the contents of your various log files meaningful to people who are not familiar with them.
If your system has objects of type A B and C which can be handled by different components of your system then your should make the logging system in those components print information about those objects in exactly the same way.
While you are at it, make the log format easily parsable by software. You don't want to be looking for a needle in a gigabyte size haystack of trace information without help from a tool which understands what it is looking for.
http://michaelsmith.id.au
I will add lots of logging to debug a specific problem and then rip it out when the problem is fixed. Permanent logging includes run time problems like serial communication errors, file not found, etc. I like to make various logging functions switchable, so user input can be logged for example, but only when needed. Once a program is running well, it should only log data for dire exceptions, unless regular accounting logs are needed.
"Otherwise don't do much logging because it will hurt application performance, sometimes drastically."
You're assuming that performance -- or, more precisely, CPU usage -- is important; in many cases, reliability (and being able to track down bugs after a crash) are far more important than CPU usage. With quad-core CPUs so cheap these days, we can easily afford to spend another thousand dollars to throw more processing power into a system which has cost a couple of hundred thousand dollars of programmer time to develop and will cost thousands of dollars an hour for any downtime.
My online applications log to a database, not a text file. Multiple applications on different machines can log to the same database table. There's no need for "log rotation"; old entries can be summarized and purged by date on the live database. With appropriate indexed fields, you can find key log entries in huge log files very rapidly.
Even program faults are logged to the database. If the program crashes, the top-level exception handler catches the event, does a traceback, opens a fresh database connection, and logs the traceback.
Don't coddle weak programmers... it's survival of the fittest out here. Either they learn to nourish themselves from the ample teat of a stack dump, or they must perish. It is for the good of our civilization. I know this seems harsh, young Jedi, but it is the way of the Elders of Assembler, from the ancient Time Before OO. Now go Forth and code.
Okay, joking aside. Parent has a great point -- logging can generate incredible volumes of text and can form a remarkable bottleneck, especially on VM systems where your OS may not be the only one hitting the disk.
So take advantage of Log4J/Net's ability to log at different severity levels and make logging globally configurable so you can enable/disable entirely at runtime. I'd recommend you log the following : object creation, scarce resource allocation, recoverable failure/error conditions and unrecoverable failures. Preface each severity level with a unique label so you can grep for it later. Even at the most verbose level, you can then grep your output to see only what's of interest to you (e.g. "unrecoverable:...").
I set a "logging level" in my configuration file so that when I need a program log to be verbose I can make it so but for normal use I can keep the logs fairly light.
A log is only useful if it records information that is used. The problem is that sometimes everything is running fine and super verbose logging is a waste but then something in the environment changes and suddenly temporarily logging a lot of debugging information is useful. Setting a "logging level" in a configuration file lets me do light logging, verbose logging and just about anything between without having to recompile my projects.
The race isn't always to the swift... but that's the way to bet!
If you're using log4j, don't use multiple hosts to write to the same nfs filesystem file. You'll run into blocking issues and log4j doesn't handle them correctly. The nirvana of clustered app logging is an async JMS queue. You fire off the message and forget it. You don't wait for file handles.
It's not CPU that's at a premium, it's disk IO. And on virtualized machines (such as is extremely popular in corporations and hosting farms) where there might be four different OSs running on the same physical hardware, disk becomes a scarce resource very, very quickly. And not only does your virtualized server go to shit, it takes the others down with it since they can't get timely disk access, either.
It depends on the app you're writing -- is it a web app, a database app, a userspace program written in C, a Perl/Ruby script, or.. At work we created our own logging library in C to emit log events for different levels, e.g. informational, debugging, warnings, errors, fatal messages. We then have wrappers around that library so that languages like Ruby can access that logging library. But on hindsight I think I would've just used syslog if I had to start over. :)
Are we talking about production environment or a development environment? In production, we log our exceptions or when major changes happen, such as something being deleted. The goal there is to help our support team and customers nail down the problems when they arise. I don't know if LoC is the right measure for these things. Our UI (it's a web app) has little to no logging since errors just appear when the UI has a problem. So we can have hundreds of lines of code with no logging there. Our backend logs much more frequently. Basically every "catch" block has a call to the logger (log4net is what we use as well). Still, most of us prefer to hook a debugger up to the process during development and rarely ever go back through the logs. The only time the logs are frequently used prior to production is when we're running automated QA and we want to know what happened when a test case fails. In that case, when we compile with the debug flag, we log quite a bit and more frequently and log things that only our developers would understand.
I guess maybe the rule of thumb is, depending on the environment, generate logs that are targeted at the audience when things go wrong. I'm not convinced frequency of logs is a meaningful measure of appropriate logging.
EvilCON - Made Famous by
"I personally find that writing debug and informational messages about every 2 to 5 lines works well for debugging an issue" Learn to use a debugger. What you are doing is backwards. The logger is meant to help out when the application you are working on is being executed remotly and attaching a debugger isnt practical or desirable. There is no code in the world that I can think of that needs a log line after every two steps in a procedure.
I don't, save the rain forest, hug a tree, prevent deforestation, stop logging now!
Segmentation fault: core dumped
I wrote up a two-part piece on logging best and worst practices a while back. See Part I and Part II if you are interested.
1 line of logging per 200 lines of code is way too much. 2 in 5 lines is absolutely insane. I've seen way too many systems where the logging made it totally unusable. People just don't realize the costs of logging everything.
There's absolutely no way to document anything this verbose. Without documentation, the logging is useless.
with guru meditation errors of course!
I Need someone to rebuild a Digitech Digital Delay pedal for me....for me...for me...for me.
Comment removed based on user account deletion
I personally put logging calls in the most used functions.
For example in my MySQL handler class I have put the logging code into the function that executes the SQL queries, which is the function that everything else relies on.
So when I need to debug I have a stack of SQL queries along with their error codes and error strings.
That's enough to figure out what went wrong.
My point is put the logging calls in the vital functions, the ones that do all the work. This will save you many code lines and processing cycles.
My logging is setup so I can quickly filter down to the type of data I want. It's more than just "information", "warning", and "error" - but by (cpp)file, module, etc. That way, if an issue arises, I can eliminate the cruft and see just what I need. Just takes planning.
Why can't all fpga/microcontroller manufacturers just release free optimizing compilers???
Syslog, of course.
I work on a real-time embedded medical device, where both performance and reliability are vital. We've got constrained resources, and the system must be extremely responsive.
Our logging scheme is pretty cool. It's written so that two computers can log to a single hard drive, and each logging statement must define a log level. So, for example, if I'm writing GUI code, I can log to log_level_gui_info, log_level_gui_debug, log_level_gui_error, or any of a number of more specific log levels.
The idea is
So, we have two ways to filter out extraneous logging that we don't care about (one actually keeps the logging from happening, and one just filters it out during analysis), and we can log as freely as we like as long as we're smart about which levels we're using.
As much faith as we all have in our own code, nothing's as frustrating as trying to analyze a log that came in from the field where there's just no information about what went wrong.
I tend to do my debugging by inserting a lot of printf statements to indicate where in the program I currently am and the value of any critical variables at that time. As the output information is no longer needed (i.e. I fixed the bug it was attached to), I tend to cull out whatever isn't useful anymore. However, I tend to keep starting messages in function calls related to a routine I'm working on or making more than a trivial change to... since chances are, knowing me, I'm going to end up putting them back in there anyway once I create a new bug... and lets face it, it WILL happen.
Once I'm done, I go back, remove or comment out (usually just comment out) all the messages that have no redeeming value for a properly functioning program, and turn the rest into debug statements which print based on the debug level provided at execution time... or sometimes I use a mask to select which types of messages to display.
-Restil
Play with my webcams and lights here
Log4j supports selective logging. That means you can have info/debug/trace priority messages in place, but never see them in the log unless you explicitly enable extra logging for that class or package. You can do this at runtime, e.g., via something like 'chainsaw' (which attaches to a running process) or hooks in your UI.
Our policy is that logs are usually very quiet. Application startup/shutdown and not much more. But if there's a problem the debugging messages are already in place to let us peak into the system, even if it's been deployed to a production site.
BTW AOP is also great for this. You can configure logging interceptors that log activity when you're in a development environmnent, but easily removed in production. This is a natural approach when going from one layer to the next, e.g., when wrapping the DAO layer.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
I write code for Telecom test systems that need to run 24x7 processing highly varying requests from dozens of different client systems. Our system consists of dozens of different processes/components per host, with multiple hosts all invoking components on each other as needed (all via CORBA). There are very many paths that any request can take through our system.
In this environment we log VERY heavily since each request is close to unique and we need to be able to determine the path it took through the system, and why it did, and what happened, in the event of any bug report. Some of the haviest used modules can produce close to 1GB of log per day per host - upto a couple hundred lines of logging information per request per process that it passes thru. We use a custom printf-like log library written in C++ (that auto rotates the log files based on various criteria), a custom tail utility for dealing with the large log files (tail a log file from a given timestamp - done instantly via binary search on the timestamps) and a daily cron job to compress the older log files and move any older than 5 days off the production servers to someplace with more storage.
First, a well functioning app should really not be generating any log file except what the application itself needs (e.g. usage logs for a web server).
Second, instead of wading through reams of log output, use exceptions and assert statements so that you only generate log output when something has gone wrong.
Finally, use your IDE. Instead of trying to infer state by combing through logs, set break points where you catch an exception. This lets you traceback to see why you ended up with the exception without suffering much if any of a performance hit.
As we move to increasingly distributed systems, log files become less useful. For many distributed apps, you may not even have access to all of the various file systems where a log might be written. Best solution is to rely on highly modular and thoroughly debugged code.
Statesman
That should be about right...
Sometimes with printf style debugging, so much data comes out that it is not practical to use the logging in normal code. That said, I dump all data I want to stdout, and use AWK to massage the results. If you put in some kind of tags or markers, this can help simplify the log processing code significantly.
You don't need databases or anything structured, simply one stream per process. The process will run sequentially, so a sequential storage structure (ie a file) is the best matched to the data. Also, I avoid markups like XML and so on in favour of plain text, since it is easier to process the plain text with standard UNIX tools. They also tend to operate more efficiently since parsing is a matter of looking for whitespace or \n. Also, the same techniques work from any reasonable language, since they can all output text.
SJW n. One who posts facts.
I wonder why, given the huge increase in the performance of computers over the last ten years and more, why it sill takes some games one to five minutes to load... one reason I am more likely to play spider solitaire now when I just want to play something for 10 or 15 minutes. Anyway... it is this kind of attitude causing it (e.g. "Screw it... we have good CPUs and lots of memory... who gives a shit if I don't feel like considering performance"). If people would program things as efficiently now as they did in years past, performance tuning analysts would be out of work, the enterprise systems I see being build now wouldn't be sucked out performance dogs out of the gate, games would be more fun, etc. etc. etc. etc.
ok... I feel better now...
-- I ignore anonymous replies to my comments and postings.
I keep my fingers crossed and pray to God, asking him to tell me where the error in my code is. It works 100% of the time. (You deserve this response for answering such a silly question, get back to work!)
On the other hand, broken code hurts application performance, sometimes drastically.
I'm an SQA engineer with years of experience working with large scale enterprise systems. Generally speaking the cost of unexpected outages or data corruption outweighs the cost of hardware. In such systems the costs of deployment activity itself can be such that you'd rather pay for more hardware to support extremely verbose logging.
Sure, boneheaded logging can cause unnecessary performance hits, the obvious example being logging in a loop when logging at entry and exit would have sufficed. But that's not what we are really talking about here. You posited that you should do as little logging as practically possible, and I believe that you are wrong.
Log lots and log often. Just do so intelligently. Use a logging framework (log4j, log4net, log4perl etc) and set the priority appropriately. Only use ERROR for real errors (unexpected code paths or data), use WARN when a performance metric is hitting a soft limit (to warn you before you hit that hard limit), and use DEBUG to verbosely log anything else of general interest. Rarely you might also want to log in an extremely verbose manner data that wouldn't ordinarily be interesting, and this should be logged at a TRACE level. Generally speaking if this is the case then the code itself is due for a refactor. FATAL should normally be reserved for errors that prevent correct startup - generally if an application runs correctly at startup then any potential faults that you see and handle now become ERRORs because there is nearly always something better an application can do than log FATAL and exit. As the OP observed, if you have a potential fault that kills your application and you don't see and handle it then you don't have the opportunity to log FATAL anyway.
By using a logging framework, many logging pitfalls can be avoided because the framework itself provides well tested facilities. eg, time-stamping, log rotation, file-handle management etc. In addition, using a framework allows the operator to tune the logging on a very granular level. This allows for a trade off to be made where if a performance impact is noted in a well used class then its logging can be reduced at runtime. Sure, there is still a small performance impact because the logging framework has to do a "if (logMessage.logLevel >= loggingClass.logLevel ) then {...}" comparision, but in the scale of things that impact is tiny.
My profession is not about finding and fixing bugs. It is about understanding the software that is being delivered and deployed. It is about understanding what defects exist (or may exist) and the possible implications of those defects. It is about reducing the risk of defects through analysis. Analysis of the software's functionality, analysis of the software's performance, analysis of the processes used to produce the software itself. You will never be yelled at for releasing software with a well understood and documented defect, but the shit will hit the fan when you release major defects that are not understood.
Logging is an _invaluable_ tool in this analysis. You'd be a fool to not use it effectively.
Wow. This is a disturbing reminder of my University assignments, for which I forgot to remove those cout "No segfault yet!\n"; statements.
Truly, logging is an amateur pursuit, & a grownup Software Engineer utilises formal verification and.... *GASP*.... unit testing!!
Seriously, do the slashdot developers call a logging function somewhere with "Okay, we've started the comment routine!" in comments.pl?
I work on set top boxes, and not every platform we port to has a good debugger (hell, its been years since I've seen a good debugger). Our logging system is all in house; multiple "levels" for each log statment, (noise,information,warnings,fatals,etc), with each module creating its own log id and setting its "preference level". It works well, but:
1) Useless logs.
Engineers not taking the time to write logs that are useful. "Got to here", "Value=1", etc. A few of us write enum-to-string functions and pass them to the logging system for cleaner output.
2) Running at the speed of 115200.
We've only got a serial port most times, with multiple threads trying to access it, there's got to be some synchronization, and this generally affects threads of any priority. Using a logger that caches and outputs logs at its own pace is nice.
this is my sig, there are many like it, but this one is mine.
interesting topic, i have been working with a new installation of a portfolio accounting system. The developers don't talk to their support personel, who can't refer the end user (me) to how to crank up the logs to find issues and thus you find core crap laying around daily or extra process that they tell you just to kill and move on. think about your end result, software out there in the wild that is supportable, not mysterious to the end user. nice product, if it was free, unfortunetly it's ubuer expensive and i feel, is a piece of...
Having spent many years as a Systems Administrator, I would argue that the most important part of logging is to make sure that it is in a format and location that other people can use. People won't use the logs if they don't know where they are. And, if you're developing for Windows, I would go further and say that the only place you should be considering logging data to is the EventVwr application. I'm not sure how the java stuff goes, but if you're developing in .NET, ditch the log4net application, and stick with the System.Diagnostic classes.
People have spent a lot of time and effort building logging systems and installing systems to monitor those logging systems. Don't try to reinvent that particular wheel.
First - let me give you my own perspective. I recommend having each subsystem log in such a fashion that you can easily grep to include or ignore that subsystem. For example for one package the following LVLx messages were the first four characters as follows:
LVL1 - basic startup and shutdown info ( a few lines per run)
LVL2 - Interactions with the database
LVL3 - Interactions with the file system
LVL4 - Detailed database interactions including each sql statement
LVL5 - amazingly verbose debug information including memory and variable allocations
In almost all cases, I recommend being able to set each level on or off. Your sysadmin (maybe yourself) will appreciate that ability.
If appropriate, I recommend an 'audit' record after each completed or aborted transaction. EG - after every order or every user change or whatever is important for accountability / business activity monitoring purposes.
This is the original question .
Hey you.
Let's have sex.
DRM: Terminator crops for your mind!
Turning on verbose logging doesn't help you after the process has gone tits up. It's ok if you are debugging, but really if you are debugging you want to use a debugger, not a log.
As far as hardware being cheap etc. as other posters have posited, I'd sure like to know where you work that a Sun E10000 and an EMC Symmetrix are considered cheap.
The fact is that if you are running apps on big iron hardware is not cheap, and the economic effectiveness of an application can well depend on how many transactions it can process in a given period of time.
Yes, it can be tough to figure out what went wrong with a sparse log. But that's the way it is, sorry.
On the other hand I do agree with people who state disk space is usually not a factor. There are ways to manage that issue.
I've seen at least one example where it looked like the programmer added logging when trying to debug a problem. Then, apparently on the assumption that it was "useful once, so it might be useful again", the logging was left in. There was a compile-time AND a runtime switch to toggle this logging. It tended to be ON all the time, cluttering the logs with needless information, and making the code look ugly. In theory, compile-time switches can eliminate the performance issue; but some of the logs depended on values that had to be computed before-hand. The result? That code had to be left in, even if the log wasn't there. It would sometimes generate a compiler warning when compiling without logging, since those variables were initialized but otherwise un-used. In some cases, macros were used to dodge around this.
The rememdy to this anti-pattern? When debugging a specific issue, I prefixed the log with the word DIAGNOSTIC in all caps, just like that. Thus, I knew that such logs should be removed as soon as the specific bug was fixed. A grep for DIAGNOSTIC would locate all these messages. The remaining logs were gradually removed, and things that were supposed to be logged under normal operation were written into the program specifications. These could be toggled at runtime via the verbose options, an option that needs to be distinguished from logs generated specificly for programmers working on specific issues.
Code to debug a specific issue should never, I repeat, NEVER find its way into a release. Many will argue that it's a waste to throw away the code that instruments the bug; but I've found this worry to be unfounded (I'm reminded of another internet rant about "how to properly delete code", and I know I've been guilty of leaving old code around, so I'm not claiming purity on any of these issues).
If a problem is likely to occur often in code that does not have bugs (e.g., a client that can't connect to a server because the server is down) then of course that should be logged, otherwise you aren't going to know why the client has no data. That just cycles back to what I said before, namely that you should only log what's specified in the requirements for logging.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
I would argue that distributed systems make logging more important. An IDE is great when you can reproduce the error in development, but it's not always possible. A log can answer questions like "which node is exhibiting the problem?", "what actions did the user perform?", "what did the message I sent look like?". The ability to enable verbose logging is a lifesaver when troubleshooting complex, distributed software bugs.
I was going to recommend this also. Dynamic code patching is a very powerful tool and can remove the need for thousands of explicit "if(debug) log;" calls.
Virtual machines like the JVM can do this too.
Afraid you're describing a very natural behavior that's unlikely to disappear. Developers (not just software) tend to work until each constraint is just met and then stop to work on the next constraint. E.g., get the load time down to the maximum acceptable time, then stop working on it.
Use log4j, and use the levels you can set. It depends on what you need, but I use info for audit trail kind of stuff. And for performance put conditionals around logging for debug and other cpu hits. You can write to syslog using log4j too, for a nice central backup. I don't know where all this chatter about vm's and disk being a bottleneck come from, I have never seen that, it's just sequential writing and I do gigs of it, no issue. But if it is, redirect to a server made to store it.
The CPAN module Log::Log4perl is a great tool for logging -- it means you can stick in plenty of debug statements, and dial them up for debugging, then dial them down in Production.
This module uses several message levels; in descending order of importance they are FATAL, ERROR, WARN, INFO, DEBUG and TRACE. It's possible to log messages to files, a screen, or even to an E-Mail message.
The real strength of this method is that you don't need a 'debug' version of the code -- all configuration is done externally, which means you can turn logging on for a Production problem, run your test, and turn it off again, and look at the log files off-line.
use Event Tracing For Windows.
(1) Log lots of information when you are debugging. Then, when you are done debugging that particular problem, TAKE THAT LOGGING CODE OUT.
If you find it tedious to find your debug log lines while debugging, it is a sign that you are logging too much everyday stuff! In the meantime, surround your debug log lines with multiple lines of "===================" or the like. Then, of course, take those out too.
(2) On an everyday basis, log only UNEXPECTED conditions. After all: if it was expected, then your code handled it, right? That means you have to plan for unexpected things to occur. You should be doing that anyway.
(3) If it is a very important issue, you should have an emailer set up, not just a logger. That way, you will be notified of urgent things come up.
(4) You should be doing tests on your code, not just logging. Logs are for when problems occur. Prevent them from occurring in the first place by writing tests. If you are not familiar with that practice, look up "test driven development" on the internet. You will find more than you want to know.
... you are of the opposite sex, have no significant communicable diseases, and pay your own way.
1. Distinguish between serious -operational- issues and other issues. The sysadmin doesn't need to know that you had a pointer problem; there's nothing he can do about it. He does, however, want to know that a message was received and the appropriate action taken. Or that a connection was attempted but failed.
2. Be grep freindly. The first log entry related to a particular activity should have an ID of some sort in the log message which is then included in every additional log entry associated with that activity.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
I used a system where log messages were assigned arbitrary tags which were themselves strings. The communications subsystem got the "COM" tag, the memory manager got the "MM" tag, etc. Log levels were just tags ("ERROR", "FATAL", etc). The system supported 32 output channels, each channel was associated with some output stream (file, pipe, etc) and each had a Boolean expression which filtered what went to the channel ("FATAL" or "ERROR" or "ALERT" and "MM" or "COM"). A channel could also be assigned a "lifetime" so that output to the channel only lasted for some period of time. Each channel could be setup as the daemon was running so it was easy to start debug messages going to a particular console, memory manager messages to a file, alerts to a pager app., etc. It was great when remote accessing a running system and diagnosing a client's issue. When you are all done you just shutdown the channel or let the channel expire. End of output and no more wasted I/O.
Actually, I agree with you -- I think logging is essential. That's why I said "Joking aside..." and then went on to suggest making it configurable.
Sorry I wasn't clearer. I might have to go back to using the "/irony" tags.
Good general advice in logging is to support different verbosity levels. Provide the ability to change the log verbosity as a command line switch or as a signal the process can receive.
Then if your log is too cluttered, reduce verbosity and try again. If it isn't telling you what you need to know, increase verbosity.
One of the coolest things I ever saw was a tape recorder that starts recording 30 seconds before you press "record". Simple concept -- just continuously record to a 30-second buffer.
So if it's truly disk/network that's at a premium, why not start logging just before you need it? Log at level "EVERYTHING" up into a circular buffer, and only dump to disk or network when there's an error?
I'd be careful about using multi-core as a crutch: if the logging code isn't written properly, it can still be a bottleneck. Last year at my office we had to excise our old logging library because it turned out that each function call to the log acquired a global critical section even if the log level threshold wasn't met. Multi core wouldn't help because it was highly likely that any given call to the log would block the calling thread.
I'm going to the casino. Don't gamble.
To me, it's important to cover both the internal and external cases in any scheme.
For the internal, it's important that your organization is capable of finding owners for failures/bugs quickly. So when a developer hits an error they don't understand or your BVTs break, an owner can be identified. Nine times out of ten, they can get you a solution in an hour or two once they've seen the error. In addition, an automation framework should work closely with your logging to bubble up the relevant errors. That can save a LOT of time.
For the external case, it's important that the errors are professional. A casual user won't dig into logs. But technical users (admins, etc) will rely on your output. The logs should be search-engine friendly. I recommend unique error codes for any log output. Localization is not always appropriate as it can cause problems for non-english searches. If support needs to involve developers, it also helps if these logs are in the developer's native language.
It might come back to haunt you later.
The society for a thought-free internet welcomes you.
I work on my company's billing system and we use both database and file logs. We process millions of transactions a month so it's important to not only have a detailed audit trail of every transaction but also be able to keep track of any errors and be able to recover transactions from logs in the case of something unexpected happening.
For file logs we use Log4Net and I recommend it. It allows us to specify log statements as Warning, Debug, Error, or Fatal. We have it set up so that all types are logged in the file while any Error or Fatal log statement is emailed to the engineers so we can immediately look into the situation. I don't believe in having too many log statements as it does nothing but make finding useful information difficult. As a general rule we log as much information as possible about each transaction as it comes in and out of each part of our system as well any major changes to that data that come from a part of processing it.
Database logs are extremely useful because they can be queried unlike file logs. This makes finding the information you need much easier. In our case however the db logs only contain information about the transaction and not the processing of it. Therefore both the db and file logs are necessary.
Having two types of logs also acts as a fail safe. If our db goes down we still have file logs. If there is no trace of something in the file logs we can check the db.
Now with this level of logging comes other issues such as archiving logs and dealing with db log tables that are too big but that's a topic for a whole other discussion.
It can hurt performance drastically, but you still need *some* logging. I agree about logging the start conditions. The other thing that I log while I'm debugging my code is every event which results in a function call, and a begin function & end function log entry. I don't find there's a lot of point in logging more than that while debugging my code. Not generally, at least... if something isn't working right, I can check the log to see what started and didn't complete, or what function isn't completing with expected results. I can then go to that function in my code, and if I can't figure out what's wrong with it I can add extra logging to the function to help figure it out. Once or twice, I've even had logging every line of code while debugging a particularly complex function.
Bottom line: Keep logging to a minimum. All you really need to know is what isn't working. But be prepared to go in and add more logging if it's needed.
If you believe everything you read, you'd better not read. - Japanese proverb
And let's hope it never does disappear or we can kiss new software goodbye.
I wonder why, given the huge increase in the performance of computers over the last ten years and more, why it sill takes some games one to five minutes to load
There are more textures, and they are stored at a higher resolution than they used to be; The 3D models have more triangles; there are more sound effects, and the quality is higher; there is more music, and the quality is higher...It takes time to load all of this into memory.
ZuluPad, the wiki notepad on crack
I frequently use log4j or it's non-java derivatives in different languages. No need to reinvent the wheel when log4j properly handles logging with probably the most flexibility and performance anyone could ask for.
That being said, you can add what ever logging you want as long as you do it outside the application code, such as through AOP (if you don't know what that is then google is your new best friend).
Logging through external means has a number of benefits. First you application code is relieved from unnecessary clutter. Second the logging can be added or removed as fit for the environment with no need for any runtime checking, which is perfect for turning of all logging in production environments. And lastly it enforces good coding practices as it makes sure people break up code in a way that makes external logging possible, which is how code should be written anyway.
The majority of all logging, dare I say all useful logging, is easy to summarize.
This is perfect for function/method interception since it comes down to something more specific.
So simply add the first logging to an interceptor that operates before the functions you want to log, and the second after the function.
If you find yourself needing detail inside the function then you need to break the function up into sub routines, so you can use this generic logging on the sub routines instead, or as well as.
...nothing's as frustrating as trying to analyze a log that came in from the field where there's just no information about what went wrong.
[ 56.529336] WARNING: A-fib detected /dev/paddles
[ 56.568802] INFO: charging defib
[ 56.741096] INFO: charging ccomplete
[ 57.218803] ALERT: shocking!
[ 58.061815] Buffer I/O error on
[ 58.163210] zapper[22402]: segfault at 000000c4 eip b321bf5f esp b320a870 error 6
BUG: unable to handle kernel NULL pointer dereference at virtual adress 00000000
printing eip: c013186b *pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: battery ipv6 paddles ac button battery cardiac_monitor thermal processor zapper fan
Man, I'd feel bad after that output...
There's no place like
And let's hope it never does disappear or we can kiss new software goodbye.
Some of that behavior I've seen, we'd be better off if it disappeared. Sometimes, simply expanding till limits are met is like the old saying "some day, computers will be the size of buildings, and have millions of vacuum tubes!"
Perhaps I'm just pessimistic because where I work they're in the process of replacing the huge, unwieldy FileMaker monstrosity that handles payroll.... with an even bigger, more unwieldy FileMaker monstrosity that does the same thing, but requires us to feed it more information (more detailed job codes, time tracked in 6 minute increments, etc). All this extra data will be used to generate reports explaining why we're always releasing late. Never mind that we've been badgering them to hire more people for months. No, it's clearly a time management issue! As the PHB in Dilbert once said, "I want hourly reports on why we're behind schedule!"
If a job's not worth doing, it's not worth doing right.
"Turning on verbose logging doesn't help you after the process has gone tits up."
How can you demonstrate that you have fixed the reported bug if you can't recreate it in the first place?
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
All logging will hurt performance, but how much depends on the system you are running, and for Java also the implementation of the JRE.
I have been using Log4J under OpenVMS and that was a real performance killer in some places.
But as a general, using all levels available will usually be a good thing. Even the 'fatal' level is sometimes useful. Like when you can't access a database then it may be considered as a fatal error that actually can be logged.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
I think you might be missing the main reason. If we don't reduce software logging, pretty soon there will be no old growth software left.
all the best,
drew
FreeMusicPush If you want to see more Free Music made, listen to Free
Games are the last bastion of efficiency. The reason they take so long to load is that they're massive today, and have to load huge textures and art into memory in order to run smoothly.
But you're right, we really do blaze through memory and disk space like it's nobody's business, and it is a shame. I get the nagging feeling that stuff should just be faster a lot lately, especially since my new laptop has twice as much RAM as the hard drive space on the computer I used in college ten years ago.
If moderation could change anything, it would be illegal.
If you're logging for debugging purposes I suggest setting your application up so that you can change verbosity on the fly. Combine that with a good naming convention within log4j and you could increase the detail in an area of interest while filtering out the rest.
Some parts of your application might not need constant logging. Put the logging code in there anyway and use filtering to keep it from ending up in the log. If you ever run into problems just remove the filter.
If you are communicating with other systems figure out how to tap into that conversation. They might work fine now but sooner or later they'll break.
In the root of your application put a document labeled "HOW TO READ THE LOGS FOR TROUBLESHOOTING". Make sure the filename stands out from all the other files in there. Explain where and how to do the things mentioned above. It might be someone else than you that has to do the troubleshooting. Or you simply forgot how to.
This is something I have recently been thinking about quite a lot. I am used to being able to easily debug software and watching what it is doing, and I had always considered logging as something you do because...well, I actually didn't really know why.
Now that I have done commercial development on larger project for a while, I have developed a somewhat better understanding of what you want logging for. I am still figuring it all out, but these are my current rules:
- Log event-related information. When your program receives input that causes it to start doing something, log that. If there is something strange or wrong about the input, log that. Once you stop processing (because you're done, or because you can go no further), log that. This way, you can see what your program has been doing. You can see that it received input (very important), you can tell if your program thought the input was correct or not, and you can see that it finished processing the input, and whether processing was successful or not.
- Provide for the option to log information about decisions your program makes. This is particularly useful for finding out why your program did not take action on something. This shouldn't always be enabled (unless, perhaps, for high-level decisions), because it would make logging overly verbose, but you should be able to enable it (preferably, focused on a specific part of the program) if you think the program is not behaving as it should.
- Last but not least, log every error condition. This should be obvious, but it is very important, so it's worth stating explicitly. You want to know if something went wrong, you want to know what went wrong, and you want to know where in your code it went wrong. Backtraces are very helpful here.
A large part of this is actually Cover Your Ass. If you log all events, and you didn't log anything, you can tell the Exalted User that the program didn't do anything, because it didn't receive anything from said user. If you validate input and log errors, even non-serious ones, you can tell the user "try with valid input", while you think about the problem and look through the code, seeing if anything is wrong on your end. If you log all errors, you can say "well, it says here there was a problem with the database connection, have a look at that", or "hmm, that value wasn't filled in, but should have been. are you sure you filled out all required fields?"
The above might sound disrespectful, but it really isn't. Very often, the users actually are doing something wrong, and they would be happy to know how they should do things to make it work, if only you could tell them. Logging everything you need to establish exactly where the problem is is invaluable for this. Your logging should tell you if the error lies inside your program or outside it. This makes your life easier, but your users', as well.
Finally, I will add that I very much agree with AaronLawrence's comment.
Please correct me if I got my facts wrong.
Logging should be fit to your needs - anyone who tells you that your logging is wrong is probably wrong. For example, I took a lot of flak in my company for having *extremely* verbose logs. Our support, who like to look at logs, blamed me for making the logs way too huge. I then got the linux server guys lecturing me about the proper way to log - I apparently should only log critical errors, and I should have several different log levels, right? WRONG. The product in question is a plugin, and relies on very flaky API's. It is subject to interference from other plugins and bugs in the host application itself. Having several log levels only lead to too much back-and-forth between support and customer (ok, now try THIS log level. ok, now try THIS log level. ok try...). So, for my needs, where anything can go wrong at any time, super verbose, scope-indented logs were absolutely necessary. For other needs, only critical errors need be logged. And every need in between has justification as well. There is absolutely no rule, and as long as your logging meets your needs, no one has any business telling you how to do what they don't understand.
First question is: who is the logging for? Sometimes the logging is for the customer, sometimes it is for support, often it's for the developers, and often it's for some combination of these. If it's for the customers, the readability, volume, and performance are all considerations. But if it's for the devs, requirements can vary greatly. Some people here say, "what about CPU" and others say "what about disk space?" If I'm logging for an app on a 5 petabyte system that only runs a relatively small chunk of code once per day, I'll go ahead and log every line if that helps in some way. If I'm logging for an embedded app with about 32k of free space, I'm going to log very differently.
In my previous office, we wrote and deployed real-time, mission-critical applications the likes of core-banking middlewares and ATM controllers, to name a few. Once deployed and set live on live banking servers, generated logs from those applications were the only means of figuring out and debugging, when something went wrong, what went wrong, how, and when. In fact, it was standard procedure to follow for the banks' IT staff when reporting any problems they encountered to open a support ticket, and send in logs.
There is no easy and sure way to determine what in the application needs to be logged. Like many things with software development, it comes through experience. As the products matured over the years, we, back in the previous office, realised what the bare minimum set was in terms of what should be logged. And that naturally helped in developing a simple pattern for what needed to be logged when a diverse set of people added extensions to the applications in terms of modules, etc.
I've been using log4net and log4j for years. Generally:
I log info at the start of every function logging all parameters passed it.
I log info at the end of every function logging return values if applicable
I log errors and exceptions at a high level
I NEVER trap the exception entirely, every tier (including the end user) needs to be aware that an error occurred, be it their fault or the programmers.
just my 2 cents.
Exceptions
All exceptions should be handled, it is never acceptable to simply print the exception message and stack trace. Exceptions should be dealt and corrective or informative action taken to highlight the issue.
For debugging purposes the stack trace should be logged at the final destination of the exception or at whenever the exception is modified.
Logging
If a class has unconditional logging, then it should be updated to use log4j. A case of unconditional logging is where there are System.out.println() in the code with no conditions surrounding the call.
Logging should always be performed at method exit and entry as follows:
- Log entry
- Log arguments
- Log exit
- Log return values
The occurrence of exceptions and the stack trace should also be logged as info level items.
Logging calls should be wrapped in enablement checks so that arguments do not get unnecessarily evaluated for example:
If (getLogger().isDebugEnabled() { :"+ theArgValue1); :"+ theArgValue2);
getLogger().debug("Entering - method - argList");
getLogger().debug("Arg1
getLogger().debug("Arg2
}
Assertions
Assertions should be used in the code to verify that the expected results have occurred. Assertions should be used as liberally as possible.
Standard Assertions should be performed at method entry and exit; these methods should validate the pre and post-conditions for the method. All arguments should be checked for validity as should the return values. Similarly the state of the object and broader system should be checked as appropriate on both method entry and exit, for example if a file is open.
Note
Please note that this is in using log4j where the logging can be turned off an on at different levels of the code during runtime.
I wonder why, given the huge increase in the performance of computers over the last ten years and more, why it sill takes some games one to five minutes to load.
The goal of almost any commercial product isn't to be 100% awesome in every possible way. It's to be adequate on most measures and awesome in the few that buyers care most about.
Game buyers care about volume and quality of content a lot more than load time. So even if some engineer figured out a way to cut load time in half, somebody would just add more content and bump up bit rates and quality levels until it was back to a tolerable load time.
Maybe it's just me, but one thing I've noticed is that new programmers tend to use very elaborate and extensive logging code with logging calls that take a zillion parameters like log level, error type, error code, short message, long message, so-one and so-on. The experienced programmers seem to use something like fprintf(stderr," log message... . That combined with maybe a few compile time macros. Of course the their code may just need less logging anyway.
How can you demonstrate that you have fixed the reported bug if you can't recreate it in the first place?
That is why I stated you have to log the process inputs.
In any case reproducing the bug can really be a severe challenge because of the non-deterministic nature of modern software.
Don't forget, next time you try to simulate the bug mount a scratch monkey first!
Firstly I agree with a previous poster that using a debugger is better than logging for most development work. Single step through code to make sure its doing what you expect.
At the risk of exposing my age, I'm also a big proponent of desk checking code and algorithms. (Yes, we had limited mainframe CPU minutes to run our card deck and paper tape applications in lab assignments in college). Use a design review and code walk through process to explain yourself. It's amazing what you'll find as your colleagues challenge you on design decisions and you have to defend it.
With good code structure, this should provide a reasonable level of logging so you can determine where something might have gone wrong or gotten stuck in production or test out of reach of debuggers. Sometimes, bugs are timing related or only occur during stress testing so logging is the way to go as a modern replacement for the hex crash dump. You should be able to enable/disable logging with an option for verbosity. You can log to a text file or to something like the windows event log.
Spy utilities are also useful logging and debugging tools. There are tools built into ODBC for logging transactions client side or tools on the database servers. For SOA there are SOAP spies. I use Mindreef. Another option is Altova.
Moving forward, we're building workflow systems using FileNet Visual Workflow (or P8) or BPEL (Oracle, Active Endpoints, etc). In these cases we fork asynch child messaging workflows that log events to a database without holding up the main workflow process. These are inserted at key points in the main workflow and usually double as process monitoring points used in various process management reports.
Finally, debugging, logging and spy tools are no substitute for poor application and code design. Include module block comments and descriptive variable names at a minimum, and a sprinkling of line comments for the tricky parts.
You can put lipstick on a pig - you will still have a pig.
Comment removed based on user account deletion
Assuming the behavior would be an improvement, which doesn't seem likely unless we breed a new class of super-developer and fire everyone who bullshitted their way through HR
As an OS developer, the type of logging done is very dependent on where in the OS the logging is going to occur. In many cases, the rate of logging events can be extremely high - exceeding millions of events per second (think of logging operations on OS semaphores on very large SMP systems). Other tracing events (such as in the IO subsystem) events are occurring at more reasonable rates (1,000s of events per second).
For logging high frequency events in the OS, logging to disk is typically not a feasible option. For these events, we log data into ring buffers in memory. The records in memory are typically fixed length. An atomic increment operation is is required to obtain an index in which to write the log entry. A small header typically consists of a time stamp, a CPU id, and a record type. The rest of the data could be either formatted text or binary data. In the case of binary data - some additional display code is required to format or analyze the data. Under normal conditions, this tracing is typically not turned on - the performance impact is just too high. The tracing is usually turned on during development testing and possibly beta testing or when attempting to reproduce a bug where the trace data might help. Tracing is turned on typically by allocating some memory that contains a pointer to the trace data, and this memory has a bit that indicates if tracing is turned on. If the structure is not allocated or the tracing bit is off, then no tracing occurs. Thus the test to determine if tracing is on is at most two trivial read operations - make sure this data is in a cache lines that contains no data that is written. Most of these reads will then be satisfied from cache and thus be very fast so the performance impact with tracing turned off is very minimal.
For the ring buffer itself - the size dictates how much data is retained. In most cases, only a few megabytes of trace data is allocated. However, in some of the high frequency cases where a specific problem is not noticed within the OS for some time - 20-30GB of memory has been used for trace data. Note that this is physically allocated memory and not something that is pageable so there the system needs to have this much available free memory. In these cases, tools to search the trace data are also necessary.
Being an OS debug tool - if tracing is turned on and the OS crashes - the trace data is logged in the crash dump and accessible (our tools can format this data from the dump file). Should the bug being tracked not result in an OS crash, then when software notices the bug, the bit indicating that tracing is turned on can be cleared thus stopping the trace data as soon as the problem was noticed and hopefully preserving useful data in the ring buffer.
With numerous trace facilities and then many options within each facility - only tracing that is needed to debug an issue needs be turned on. As this code is shipped with the OS, should a customer be having an issue and the issue can not be reproduced in house, it is often a great benefit to have the customer turn on this tracing.
With a trace routine that is basically a printf to memory - should some additional trace code be required, it is easy to go add addition trace code to the OS and rebuild the necessary components and reboot the system - for my specific OS, this can be accomplished in just a few minutes.
The above tracing mechanisms have been very valuable to debugging. This type of tracing can also provide quite a bit of insight into how the OS operates (especially for large SMP environments) and provide performance data on the system. In the semaphore example, if the semaphore address is recorded along with information on when the semaphore is acquired and released, then reporting tools can be produced that show the percentage of time a particular semaphore is held. If CPUs are spinning on a semaphore and that event is also recorded - then the ability to report the percentage of time CPUs are waiting on semaphores is also pos
Consider, the attitude of "just throwing more processing power" at something is driving up electricity usage at data centers and office buildings across the country. A responsible programmer in this day and age should consider the energy footprint their code will have. Yes, I am completely serious.
1 line of logging per 200 lines of code is way too much. 2 in 5 lines is absolutely insane. I've seen way too many systems where the logging made it totally unusable. People just don't realize the costs of logging everything.
There's absolutely no way to document anything this verbose. Without documentation, the logging is useless.
Huh?
Depends entirely on logic, not generalized statements. If you're into a potentially problematic section of code, 5-10 log statements per 100 lines may not be enough. If you are in a section of code that is verbose code but not much to go wrong, the logging statements should be far fewer. Good logs is essential to debugging and problem determination and more is generally better than not enough so generalization is not going to fit here.
And in the code, just because there is a log statement does not mean it generates logs, add filters to the approriate level,
isn't that hard to implement is it?
Mod parent way up.
We have a computer rack installed at each of our many satellite locations, and a main data center. The satellite racks are now filled with individual application servers and networking gear, and the tiny closets where they are installed are maxed out as far as energy consumption and cooling. We are fast approaching physical limits for how much code we can afford to add to these remote sites, and inefficient code makes the problem worse.
John
As the PHB in Dilbert once said, "I want hourly reports on why we're behind schedule!"
More aphorisms:
o The way to herd cats is to put them in the vicinity of mice and let them do what they do best;
o Too frequent demands for reports from software developers is akin to digging up a seed every day to see if it sprouted.
These are the two prime insights I have from having run teams of up to 70 developers over a career a few decades long. It ain't your usual management job, and it's not done that well remotely. You have to troll the aisles and pick up the exceptions.
But logging is good, in moderation. And throwing away resources is always bad; inefficiencies always show up and are noticed. I remember one of my programmers dragging in a rather large library just to sort a list that never exceeded a half-dozen short strings. I was less than pleased.
Do not mock my vision of impractical footwear
The specific scheme doesn't matter, but you want to log when certain conditions are true.
Something lime this is at the heart of many logging schemes:
Have a function LOG(flag,text).
LOG compares flag with a global/static variable CURRENT_FLAGS which is set during initialization or perhaps in response to special log-level-setting commands. If flag is in the currently-active flags, log the text, otherwise, discard it.
Typically, all logging that is debug-only will be surrounded by compile-time "#if DEBUG/#endif" or equivalent, so they aren't even compiled into non-debug builds.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I've written my own similar schemes in the past but in all cases I made sure the log-level required was settable in a config file readable by the program. So starting with "no logging", if you encounter a problem you change the config file's log-level and send the program a signal or, in the worst case if the program can't (or wasn't designed to) handle signals, stop it and restart to force it to re-read the config file. This way you only log verbosely when you need to and you don't need to re-compile/re-link to turn logging on.
Time flies like an arrow; fruit flies like a banana.
Some comments, above, seem to say that every app should produce Mitchner-novels-per-second of debug and log. Others seem to go to the opposite extreme. Both are right, and both are wrong.
There are some applications, and some portions of most apps, that have extreme performance issues, and for which excessive logging is likely to produce a problem in production. And, there are some applications, and some portions of most applications, for which very detailed logging is needed to meet specific diagnostic or auditing requirements. And, there is a lot of gray area in between those extremes.
We use a variable approach. We make it very easy to code a "debug" or "log" output -- It's just a one line call. And, those routines have the first parameter for the "level". A separate call is available to set the desired level for the process. Level 0 message are reserved for major events and severe errors. Level 1 is for intermediate level events and warnings. Level 2 for minor events and warnings. Debug output starts at level 3 or 4. Inner-loop debug output is at level 5 or 6. At level 5 or 6 you get the proverbial Mitchner-novel-per-second. We generally use level 4 on for dev testing, and level 0 or 1 on production systems.
Efficiency is a key, both in execution and in coding. The code is small and easy to write -- only one line of code per message. And, the logging routine itself is very efficient in the case where it decides not to print the message because the level is higher than the current per-process level. A message that is not printed consumes just a few machine instructions.
In new code, we typically have a logging call per three or four statements. In difficult algorithms, maybe more. In straight-forward byte-moving code, maybe less. As code becomes more mature, and we become more confident that it is right, we might comment-out some or even most of those calls. When a piece of code is under investigation, and the developer doesn't quite see the problem yet, I often advise him/her to add more debug logging to that area.
There's no need to clutter up source code with logging statements for debugging. Just trace appropriate functions as needed.
A long time ago (in 'Net time anyway), there was a Slashdot article regarding what IBM could do to help the open-source world, now that they are heavily into Linux. Here's one place they may be able to help, through experience if not actual code. On z/OS operating systems, there are two (or three) logging architectures: there are instructions and supervisor calls for low-level tracing (Generalized Trace Facility) which is a high-overhead debugging tool. There's the System Management Facility, which is a software standard for logging things like utilization data, events such as file open/close (with I/O or media error statistics), etcetera. SMF would be the closest to 'normal' logging. There's then a 'System Logger' facility which handles log files of all sorts, cycling them, etc. This kind of standardized logging architecture might be a good thing for Linux, also.
You should be able to dial logging up and down. Use log levels to this advantage. If you are in C++ use compiler pragmas to remove trace logging statements. When doing trace logging you want very verbose output but that's only helpful to you. Think about the application as it ages in the hands of an administrator.
If your application pumps the log full of messages all the time eventually your administrators will start ignoring the logs. They'll be information saturated and unable to distinguish the errors from normal operations. This is bad. In production only log problems and provide plenty all the context you can when you do log.
Your administrators are users too and they only have so much time in a day. You want to only complain to them when something very bad has happened. You want them to sit up and pay attention when your application burps out a log.
Obviously, I've presented the most extreme case... there are points on this continuum of silent to verbose that make sense for different situations. Use your judgment and be mindful of your administrator's "information budget".
[signature]
It's the only way to fly! It's like linked lists or quicksort implementations... why reuse a well-tested, properly engineered implementation when you can build your own from scratch?
At least in Windows XP, I have found that the TraceMessage() function seems to run nicely with the MessageFlags parameter set to zero, meaning that only the USHORT MessageNumber identifies your message, thus minimizing log-file-size. But if you're the only app logging to your ETW session then 64k message-types are enough, why would I need a 16-byte GUID in each message ?! By comparison only TraceEvent() was available in Win2K, significantly more overhead. I also briefly examined EventWrite()( VISTA ) and it seemed way more bloated than even TraceEvent(). According to MSDN TraceMessage() continues to be officially supported in Vista and server-2008, so there is still a way to do fairly lightweight tracing. It's definitely nice to have these kernel-logging-sessions just keep on ticking while your apps crash.
that way you can see exactly where the user was at when it crashed.
For the last ten fifteen years, I have been big on debug code writing html pages into the web root so that the developer can locally browse the debug data. Emitting code with links makes it easy to get around. I have used this extensively and it works very nicely.