A Glance At Garbage Collection In OO Languages
JigSaw writes "Garbage collection (GC) is a technology that frees programmers from the hassle of explicitly managing memory allocation for every object they create. Traditionally, the benefit of this automation has come at the cost of significant overhead. However, more efficient algorithms and techniques, coupled with the increased computational power of computers have made the overhead negligible for all but the most extreme situations. Zachary Pinter wrote an excellent article about all this."
A previous poster noted that most GC algorithms are distinctly unfriendly to virtual memory systems. They usually have similar problems with cache locality, which can result in an enormous slowdown, regardless of the time actually spent in the GC itself. A practical problem is that GC regimes are notoriously non-portable, so that each new language implementation needs to have the (increasingly complex) GC re-done again.
A more fundamental problem is that memory is only one of many resources a typical industrial program must manage. GC takes over memory management, but leaves the other scarce resources -- file descriptors, sockets, mutexes, database connections -- to be managed manually, as in C. (Java has this problem, for instance.) "Finalization" simply cannot provide the necessary guarantees.
Given a resource management regime that can handle all these other important resources, as is commonly practiced in C++, memory becomes just another resource. Management is encapsulated the same way for all. A language that lacks the tools necessary to implement such a regime needs GC, so the presence of GC may actually (as in the case of Java) indicate a fundamental weakness in the language.
(Anybody who thinks languages like Haskell or ML are fundamentally more powerful than C++ must be unaware of the Boost Lambda library, and of FC++, a set of header files that implements Haskell language semantics for C++ programs. They get along fine without GC, as well.)
It was mentionned earlier that reference counting was pretty good, but had a few drawbacks when it came to cycles and multi-threading.
...
I took a bit of time to go and read Wikipedia's page
In the description they give, they mention that reference counting GC can represent managed objects by directed graphs.
I know there exists algorithm to find cycles in such graphs. So I suppose these could be applied to this problem. Other proposal are to use a tracing GC to detect them. To which it was replied that this would be able to reclaim the memory but not to properly finalize the objects. I don't see why that would be true. I mean, if you have found a member of the cycle to be collected, can't you just finalize that one and let the whole cycle unravel itself ? If there are cycles inside that cycle, just do it again on these etc
As I said, another common objection was the cost of updating the counters in multithreaded environnments. Multiple solutions have been proposed, some more portable than others (using processor/platform specific atomic increments, or deferring the update until it is really necessary and using the standard mutex protection)
All this said, I try to understand a couple of things.
-I am no genius, thus these ideas must not be new, what is the problem which can't be solved with these?
-Reference counting seems to integrate better in the runtime of the program. All the other techniques proposed seem to imply some monolithic operation on the memory summing up all the overheads at on time and doing the cleaning once in a while, with the possibility of becoming a bottleneck in heavily loaded systems. Reference counting OTOH seems to allow the cleanup to continually add a little bit of overhead to the system but nothing which will bring the whole thing to a grinding halt before allowing it to go on. What have I missed?
It has different collectors, which you can select according to the needs of your application. Currently there are two, the default collector (generational) and an incremental collector which is slower but less likely to pause.
Also, the default collector is a 3-generation one, not 2, at least as of Java 1.4.1. More details here.