Review:Garbage Collection
The first time you encounter garbage collection (GC, for short) is often by using Lisp or Scheme in an introductory computer science course, where it can seem an almost magical technology. How does the interpreter determine that a variable has become garbage, and can safely be deallocated? This book explains how GC works, in great detail, and interest people implementing garbage collectors or who've wondered how it's done. There are 3 approaches to GC, all of which are covered in this book:
- Reference counting:
Every object keeps a count of the references pointing to it. When new references are created, the counter must be increased; when references are destroyed, the counter is decremented, and, if the count is zero, the object has become garbage and can be reclaimed. Reference counting unfortunately exacts the most overhead from constantly incrementing and decrementing the counters, and in its simplest form it can't collect cyclic data structures, because their count will never reach zero. - Mark-sweep:
Traverse the graph of object references, starting from the roots of the computation; roots might be the program's global variables, the contents of the C stack, static class variables, or a main namespace of some sort. Each object that's traversed is marked as being reachable; when the traversal is complete, any unmarked objects are unreachable because they can't be reached by the program any longer, and can be reclaimed as garbage. - Copying:
Divide memory into two halves, called "Fromspace" and "Tospace". Memory for new objects is always allocated out of Fromspace. When Fromspace fills up, a traversal similar to the one used by mark-sweep is done, but instead of just marking objects as reachable, they're copied into Tospace. Once the traversal is done, all the live objects are now in Tospace, and Fromspace no longer contains any live data; only garbage objects can be left in Fromspace. Tospace is now declared the new Fromspace, and new objects are allocated until it fills up; the algorithm is then repeated.
This book covers the above 3 algorithms for GC in great detail; there's a chapter for each algorithm, examining its limitations, common implementation techniques, and historical notes on the its development. Later chapters cover more advanced collectors that extend the basic algorithms in various ways. Generational collectors are based on the observation that many objects are temporary and have short lifetimes, while some few objects may last for the entire lifetime of the program. A generational garbage collector concentrates on young objects, because they will occupy the most recyclable space, and spends less time examining old objects that probably won't be reclaimed. Incremental and concurrent collectors don't bring the whole program to a halt while they scavenge through memory doing a collection, making them more suited for interactive or even real-time applications.
Chapter 9 is an interesting examination of the implementation of GC
in an unfriendly environment, namely C programs. The Boehm-Demer-Weiser
collector implements a mark-sweep garbage collector that can
replace C's malloc() function, scanning stack frames and
registers for potential pointers, and managing to do it fairly
efficiently. One study found the Boehm-Demer-Weiser collector added
20% more execution time overhead than malloc() did --
that is, the time spent in memory allocation was 20% longer. (That
doesn't mean it made programs 20% slower, because most programs spend
more time computing results than they spend allocating memory). A
garbage collector for C is a tricky job, complicated by the lack of
type information, data values that may mimic pointers, and compiler
optimizations that may render objects invisible to the
collector. Barlett's Mostly Copying Garbage
Collector, also described in this chapter, is more liberal and
requires some assistance from the C programmer in exchange for
greater accuracy.
The remaining 3 chapters consider suggestions for adding garbage collection to C++, the interaction of GC with cache memory, and distributed garbage collection. The topics of these chapters are still subjects of active research, so the final chapters are lighter on implementation details and heavy on references to research papers, making them a lot less interesting.
The authors are computer scientists specializing in GC (Richard Jones maintains a garbage collection page), and their coverage is quite complete, which at times makes for exceedingly dry reading as the authors enumerate what seems like every minor implementation variation and every relevant paper. On the other hand, if you're actually faced with implementing a garbage collector, the pointers into the research literature will be very useful. A casual reader (like me) can simply skim, or even skip, pages until the next interesting section arrives.
Excerpt
However, the matter is more subtle than this. The collectors
described in this chapter are very simple and inefficient. The
behaviour of these collectors cannot be automatically ascribed to
their more sophisticated variants. The cost of copying an object is
likely to be more expensive than simply testing and setting a
mark-bit, particularly if the object is large. Although mark-sweep
must sweep the entire heap, in practice its real cost is dominated by
the mark phase. Linearly scanning the heap will generally be less
expensive than tracing data structures even using the simple technique
shown above. ... Furthermore, more sophisticated methods can
substantially reduce the cost of the sweep. Although copying
collectors have predominated in the past, recent studies suggest that
the choice between mark-sweep and copying garbage collection may well
depend as much on the behaviour of the client program as on the
inherent properties of the garbage collection algorithm.
-- From section 2.4 of the book.
To purchase this book, head over to Fatbrain.
Table of Contents- Introduction
- The Classical Algorithms
- Reference Counting
- Mark-Sweep Garbage Collection
- Mark-Compact Garbage Collection
- Copying Garbage Collection
- Generational Garbage Collection
- Incremental and Concurrent Garbage Collection
- Garbage Collection for C
- Garbage Collection for C++
- Cache-Conscious Garbage Collection
- Distributed Garbage Collection
0 of 105 comments (clear)
No comments match the current filter.