clusteroid81 · Slashdot Mirror

← Back to Users

User: clusteroid81

clusteroid81's activity in the archive.

Stories: 0
Comments: 1
First seen: 2008-08-17
Last seen: 2008-08-17
Profile: (view on slashdot.org)

Comments · 1

Logging for an OS on Software Logging Schemes? · 2008-08-17 04:15 · Score: 1

As an OS developer, the type of logging done is very dependent on where in the OS the logging is going to occur. In many cases, the rate of logging events can be extremely high - exceeding millions of events per second (think of logging operations on OS semaphores on very large SMP systems). Other tracing events (such as in the IO subsystem) events are occurring at more reasonable rates (1,000s of events per second).
For logging high frequency events in the OS, logging to disk is typically not a feasible option. For these events, we log data into ring buffers in memory. The records in memory are typically fixed length. An atomic increment operation is is required to obtain an index in which to write the log entry. A small header typically consists of a time stamp, a CPU id, and a record type. The rest of the data could be either formatted text or binary data. In the case of binary data - some additional display code is required to format or analyze the data. Under normal conditions, this tracing is typically not turned on - the performance impact is just too high. The tracing is usually turned on during development testing and possibly beta testing or when attempting to reproduce a bug where the trace data might help. Tracing is turned on typically by allocating some memory that contains a pointer to the trace data, and this memory has a bit that indicates if tracing is turned on. If the structure is not allocated or the tracing bit is off, then no tracing occurs. Thus the test to determine if tracing is on is at most two trivial read operations - make sure this data is in a cache lines that contains no data that is written. Most of these reads will then be satisfied from cache and thus be very fast so the performance impact with tracing turned off is very minimal.
For the ring buffer itself - the size dictates how much data is retained. In most cases, only a few megabytes of trace data is allocated. However, in some of the high frequency cases where a specific problem is not noticed within the OS for some time - 20-30GB of memory has been used for trace data. Note that this is physically allocated memory and not something that is pageable so there the system needs to have this much available free memory. In these cases, tools to search the trace data are also necessary.
Being an OS debug tool - if tracing is turned on and the OS crashes - the trace data is logged in the crash dump and accessible (our tools can format this data from the dump file). Should the bug being tracked not result in an OS crash, then when software notices the bug, the bit indicating that tracing is turned on can be cleared thus stopping the trace data as soon as the problem was noticed and hopefully preserving useful data in the ring buffer.
With numerous trace facilities and then many options within each facility - only tracing that is needed to debug an issue needs be turned on. As this code is shipped with the OS, should a customer be having an issue and the issue can not be reproduced in house, it is often a great benefit to have the customer turn on this tracing.
With a trace routine that is basically a printf to memory - should some additional trace code be required, it is easy to go add addition trace code to the OS and rebuild the necessary components and reboot the system - for my specific OS, this can be accomplished in just a few minutes.
The above tracing mechanisms have been very valuable to debugging. This type of tracing can also provide quite a bit of insight into how the OS operates (especially for large SMP environments) and provide performance data on the system. In the semaphore example, if the semaphore address is recorded along with information on when the semaphore is acquired and released, then reporting tools can be produced that show the percentage of time a particular semaphore is held. If CPUs are spinning on a semaphore and that event is also recorded - then the ability to report the percentage of time CPUs are waiting on semaphores is also pos